Tuesday, March 24, 2020

More Bias in DNA databanks.

This study:
Genetic analyses identify widespread sex differential participation bias
is yet another example of the bias problems in these large consumer and other databases. This one looked at several, including 23andMe and the UK Biobank.
With 23andMe, a GWAS just for "male vs. female" had 150 "signficant" loci and many of these loci were previously correlated to complex traits from other GWAS that used the database. This is a problem, because it suggests that many of the previously discovered loci for particular traits might actually just be an indication of bias in the databank and have no causal relationship to the trait as the authors point out:
Finally, we demonstrate how these biases can potentially lead to incorrect inferences in downstream analyses and propose a conceptual framework for addressing such biases. Our findings highlight a new challenge that genetic studies may face as sample sizes continue to grow.
A broader problem related to this is... Every GWAS performed to date using the biased databank, since this form of bias was not recognized when those studies were performed. I don't expect it to happen, of course, but this should lead to a reevaluation of any GWAS previously performed using the database with a correction that will further dwindle the results. Sex differences is an easy to recognize bias to test for, but there are no doubt many more that remain unrecognized and the fact of the matter is, that you will never  be completely sure you have eliminated them all, so you can never say for sure whether you are finding anything but noise in these studies (I think that is the case, for the record, with behavioral genetic phenotypes). So in addition to population stratification issues in these studies, which also never seem to be fully recognized, the databases themselves have their own stratification issues.

Interesting Update: Another study just came out that incidentally looked at the same thing in the UK BioBank. This one found NO hits. I think this is likely a good demonstration of how participation bias created a very large number of false positives (23andMe) vs. the UK BioBank, which perhaps didn't have the same participation bias and shows that a large number of "significant" hits can be produced simply with noise. Again, we are left with the question of whether anything from these studies are true genetic correlations.



3 comments:

  1. Great article you're doing important work. i'm a big fan of your writing and was wondering if you could do more long form written pieces

    ReplyDelete
    Replies
    1. Thanks. I will be working on one shortly. Do you mean directly related to GWAS or more general topics?

      Delete
  2. Here's a crap politically biased study i think you'll have fun with tearing to shreds https://www.researchgate.net/publication/338600567_Educational_attainment_polygenic_scores_in_Hungary_evidence_for_validity_and_a_historical_gene-environment_interaction

    ReplyDelete