Thursday, July 5, 2018

Are Polygenic Risk Scores Just a Measure of Population Stratification?

This preprint article  came out recently regarding GWAS results for height.  It created a bit of a stir amongst scientists involved in this kind of research, as it questioned the validity of previous, significant genetic associations for height.  This got me to thinking about the current Holy Grail of  GWAS researchers: The polygenic risk score.  I believe that their conclusion throws the whole concept into question.  Let me explain...First, let's look at some of the conclusions from the study noted above that created such a stir:
Because multiple prior lines of evidence provided independent support for directional selection on height, there is no single simple explanation for all the discrepancies. Nonetheless, our current view is that previous analyses were likely confounded by population stratification and so the conclusion of strong polygenic adaptation in Europe now lacks clear support. Moreover, these discrepancies highlight (1) that current methods for correcting for population structure in GWAS may not always be sufficient for polygenic trait analyses, and (2) that claims of polygenic differences between populations should be treated with caution until these issues are better understood.
To be honest, when I started writing critiques of these GWAS studies, I had assumed that the issue of population stratification had largely been "tamed" by researchers.  My primary premise has been that the significant associations noted in these studies were simply random, false positives.   Clearly, though, there are also significant population stratification issues that have yet to be resolved and are, effectively, invisible to our analysis.  Presumably, there are genetic associations in these populations, unrelated to the traits in question, that find their way into these studies, giving hopeful results, that generally are never replicated.  In the case of height, the associations thrown into question were some of the most significant findings in previous studies.  This is where I think that polygenic risk scores are thrown into question, as well. 
If you are not familiar with polygenic risk scores, I'll give the quick, Wiki explanation, which should suffice for my point, but feel free to research further:
polygenic score, also called a polygenic risk scoregenetic risk score, or genome-wide score, is a number based on variation in multiple genetic loci and their associated weights (see regression analysis). It serves as the best prediction for the trait that can be made when taking into account variation in multiple genetic variants.
Because of the abject failure of GWAS studies to find and replicate genetic associations for traits,  other methods of confirming some validity of the loci found in these studies have been proffered, with vague, unproven explanations as to how complex, polygenic traits don't directly replicate.   So the polygenic score has served as a sort of collective "replication".  By taking the weights of these various associations, the idea is that we can get a sense of whether an individual has a large enough number of these associated genetic loci (presumed SNP's) and a "high score", then they are more likely to have the trait. 
Moreover, by setting up a polygenic score using the significant loci from one or more previous GWAS studies, one could test this against an independent study, to see whether the scores hold up for the trait in the new study, even if they don't actually replicate any of the loci individually.  This is something I've heard offered as proof of the validity of GWAS studies and passed off as replication.  They will then quantify the results, for example, stating that their results account for 10% of the predicted variance.  Such calculations have always struck me as dubious.
Here's where the study related to height throws all this into question, in my view.  Most of the associations used in polygenic risk scores are not as strong as the ones for height that were tossed.  And these height associations had held up in more than one previous study,  perhaps suggesting replication.   So, if even strong associations were likely false positives from unrecognized population stratification, then weaker associations are certainly thrown into question, as they might also be related to population stratification. 
If that's the case, and I assume it so unless someone can make a valid argument to the contrary, then what we really have are a series of false positive associations, many due to population stratification.  Since, as we have seen, these issues often appear to follow from one study to the next, what we really might have here is a population stratification risk score.   Thus, instead of saying that a polygenic risk score accounts for 10% of the variance, perhaps we are really calculating a "10%" rate of population stratification. 
I realize this might not sit well with researchers, but I think that it's hard to dismiss this out of hand.  The genetic associations from GWAS's continue to fail to replicate, even with massive datasets now available through entities like the UK BioBank and 23andMe.  This seems a much more likely explanation for apparent correlations between studies, when independent replications are not happening.  They simply have not worked out the population stratification issues.  High polygenic scores between studies, then, are little more than a flag that we have not adequately addressed population stratification.

No comments:

Post a Comment