Wednesday, January 23, 2019

The UK BioBank: The Beast of Pop/Strat

Here is yet another study looking at population stratification issues related to GWAS studies and polygenic score results: Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis.
They looked at geographic structure and found that the UK Biobank is subject to a lot of stratification in that regard. They looked at BMI (body mass index), household income, and educational attainment and found all of them to be subject to geographic population stratification, even with principle component analysis.  First they looked at a smaller subset of genetic data from a previous study (ALSPAC)
...we anticipate that the educational attainment of people who migrate for economic reasons differs from people who do not. Educational attainment is therefore aligned to subtle genetic differences even in this apparently geographically and ethnically homogenous population and this is co-incident with axes of ancestry.
They move on to the beast, the UK Biobank:
GWAS for birth location identified that single variants are associated with geography within UK Biobank. An unadjusted model produced distorted and inflated plots with evidence for association at variants across the autosome.
They also point out that they might not even have the full extent of the geographic stratification and, "lack of association between a PS and birth location may be insufficient to assert that the PS is free from stratifying bias."
This all leads them to point out the limitations of GWAS and polygenic scores, because of pop strat issue:

Now manifest, this property should be added to the growing list of limitations to naive use of PS—including horizontal pleiotropy12, high false discovery rate40, association with coarse ancestral groups41 and prediction of inter-generational phenotypes which complicates interpretation42.The ability of very large studies to detect effects indistinguishable from artefactual biases or ancestral differences demands reworked approaches to exploit43, or at least account for, structure.  
The problem here for anyone conducting these studies is that, there might always be some population stratification around the corner that is not being accounted for and is skewing their results and leading to false positive results.  Therefore, it would be very hard to EVER definitively say that you have a valid result, particularly when you have an homogenous (read, White European) data group evidenced by the fact that more diverse populations invariably water down the results.
I think it is valid to ask whether, if we are able to identify all population stratification issues, we would be left with anything?

No comments:

Post a Comment