Tuesday, March 24, 2020

More Bias in DNA databanks.

This study:
Genetic analyses identify widespread sex differential participation bias
is yet another example of the bias problems in these large consumer and other databases. This one looked at several, including 23andMe and the UK Biobank.
With 23andMe, a GWAS just for "male vs. female" had 150 "signficant" loci and many of these loci were previously correlated to complex traits from other GWAS that used the database. This is a problem, because it suggests that many of the previously discovered loci for particular traits might actually just be an indication of bias in the databank and have no causal relationship to the trait as the authors point out:
Finally, we demonstrate how these biases can potentially lead to incorrect inferences in downstream analyses and propose a conceptual framework for addressing such biases. Our findings highlight a new challenge that genetic studies may face as sample sizes continue to grow.
A broader problem related to this is... Every GWAS performed to date using the biased databank, since this form of bias was not recognized when those studies were performed. I don't expect it to happen, of course, but this should lead to a reevaluation of any GWAS previously performed using the database with a correction that will further dwindle the results. Sex differences is an easy to recognize bias to test for, but there are no doubt many more that remain unrecognized and the fact of the matter is, that you will never  be completely sure you have eliminated them all, so you can never say for sure whether you are finding anything but noise in these studies (I think that is the case, for the record, with behavioral genetic phenotypes). So in addition to population stratification issues in these studies, which also never seem to be fully recognized, the databases themselves have their own stratification issues.

Interesting Update: Another study just came out that incidentally looked at the same thing in the UK BioBank. This one found NO hits. I think this is likely a good demonstration of how participation bias created a very large number of false positives (23andMe) vs. the UK BioBank, which perhaps didn't have the same participation bias and shows that a large number of "significant" hits can be produced simply with noise. Again, we are left with the question of whether anything from these studies are true genetic correlations.



Friday, March 13, 2020

The Trickle down of GWAS to Race Science

I like to point out that many of the genetic studies related to "IQ," "g" and "Educational Attainment," whether or not their intentions were good, tend to attract racists of varying degree, from the smooth-talking race scientists down to white nationalists and overt racists trying read the study as a whites are smarter than blacks because of their genes misinterpretation (leaving aside the fact that most of the studies are unreplicatable). This study which examines which people tend to pick up particular studies on social media sites like Twitter quantifies this and notes:
Our study provides conclusive quantitative evidence that white nationalists and adjacent communities are engaging with the scientific literature on Twitter. Not only are these communities a ubiquitous presence in the social media audience for certain research topics, but they can dominate the discourse around a particular preprint and inflate altmetric indicators.
Often, once this process begins, the scientists involved in the study and other experts in the field attempt to debunk this misappropriation of the science. Unfortunately, this does little more, in my view, than amplify the debate in a "both sides" dichotomy, effectively giving credence, or at least attention, to the racist views. While scientists will try to defend or find a use for such studies to justify their existence, these are often a reach and fall flat, leaving one to ask what purpose they serve other than to energize racists?