Variable prediction accuracy of polygenic scores within an ancestry group
is yet another in the long line of studies showing that polygenic scores are replete with problems like population stratification and assortative mating. It buries the lede a bit, in that it takes another swing at the so-called "educational attainment" genetics, probably because I sense the authors don't want to completely give up the ship, but let me make a few points about the study.
They looked at the UK Biobank, which of course is a rather homogeneous and large dataset. They want to point out some obstacles in using polygenic scores garnered from this population on other populations. They point out linkage disequilibrium (having to do with the fact that they pinpoint regions of the DNA rather than than specific genetic variants but rather a cluster of genes in a region that might be distributed differently in different populations, leading to different weighted "causal" SNP's. They also not that SNP allele frequencies might differ significantly, leading to different results in comparison to the original dataset. My focus for this review, though, are population stratification issues and assortative mating and how they effect a polygenic score.
The study creates its own deliberate population stratifications for age, sex and socioeconomic status and notes significant differences in pgs when populations are divided up in this way. They also interestingly look at the difference in GWAS and pgs performance when looking at unrelated individuals, vs. looking at individuals that have siblings in the Biobank. I am going to cut to the chase here:
Once again, we have a study showing serious defecits in the ability of a GWAS/PGS to recognize population stratification issues. As they point out in the "Implications" section:
They looked at the UK Biobank, which of course is a rather homogeneous and large dataset. They want to point out some obstacles in using polygenic scores garnered from this population on other populations. They point out linkage disequilibrium (having to do with the fact that they pinpoint regions of the DNA rather than than specific genetic variants but rather a cluster of genes in a region that might be distributed differently in different populations, leading to different weighted "causal" SNP's. They also not that SNP allele frequencies might differ significantly, leading to different results in comparison to the original dataset. My focus for this review, though, are population stratification issues and assortative mating and how they effect a polygenic score.
The study creates its own deliberate population stratifications for age, sex and socioeconomic status and notes significant differences in pgs when populations are divided up in this way. They also interestingly look at the difference in GWAS and pgs performance when looking at unrelated individuals, vs. looking at individuals that have siblings in the Biobank. I am going to cut to the chase here:
We applied the approach to 22 traits, focusing on traits with relatively high heritability estimates as well as social and behavioral traits that have been the focus of recent attention in social sciences. For the majority of the traits, such as diastolic blood pressure, BMI, and hair color, the prediction accuracies of standard and sib-based PGS were similar, as expected under standard GWAS assumptions and as observed for two traits simulated under these assumptions (Fig. 3B). However, for a range of social and behavioral traits, such as years of schooling completed, pack years of smoking and age at first sexual intercourse, the prediction accuracy of the sib-based PGS was substantially lower than that of the standard PGS (Fig. 3B). It was also significantly lower for two morphological traits, height and whole body water mass.This points to the likelihood that the pgs predictability was largely a population stratification and assortative mating issue, especially since it largely applies to "social and behavioral traits," but not to more obvious physical traits. I can't say much about whole body water mass and why that would be different in result to BMI, but certainly height has a high assortative mating issue behind it as well. In other words, what we are probably seeing is social stratification with GWAS markers highlighting that fact in the way that they highlight the fact that someone is of a particular ethnic/geographic background, rather than as causal variants for the trait (ie. having Asian genetic traits increases your likelihood of chopstick dexterity, but none of the genes identifying you as Asian actually make you a better chopstick user).
Once again, we have a study showing serious defecits in the ability of a GWAS/PGS to recognize population stratification issues. As they point out in the "Implications" section:
But as we have shown, differences in the degree of environmental variance are not the primary explanation for the patterns we report (Fig. 2), and other factors, including differences in the magnitude of genetic effects among groups, indirect effects and assortative mating, also lead to differences in the prediction accuracy of PGS, in ways that may make applications of phenotypic prediction problematic, even within a single ancestry group.
No comments:
Post a Comment