Wednesday, April 11, 2018

Genome-wide Association Studies and why they are so prone to false positives

I thought that Genome-wide Association Studies (GWAS) were going out of favor, but it looks like they are most of what genetic researchers are cranking out, which is puzzling.  I was always surprised that they were given any credence at all as anything other than a screening study.  It always seemed likely to me that they were only producing false positives and it was frustrating when I wrote letters to journals and the authors would respond in a way that made it clear they did not understand basic concepts of statistics (when I get a chance, I'll try to round up all my previous letters to the editor that survived the peri-internet age of the early 2000's).  Let me explain why they produced so many false positives:

Generally, these studies will look at a couple hundred genetic loci (not necessarily a gene, but a location on a gene using genetic markers).  They would split the group into those with the disorder and those without the disorder.  and see which of the genetic markers  were in a higher correspondence to the non-control group.  Then they would use one of a few different algorithms to determine whether that correspondence exceeded chance probability.  This was where they missed the boat.  If you were studying a single genetic loci and you had such a correspondence that exceeded chance, you had could say you had something.
What they didn't seem to understand is that, when you are scanning a few hundred loci, your odds change.  You can't just use the same algorithm to calculate the likelihood that the correlation is stronger than chance.  The more loci you include, the more outliers you should expect to get.  And it turns out that is all they ever were.   Generally, they would find two or three loci that they claimed corresponded to the disorder in a greater than chance probability.  Sometimes this would even translate to a front page story in major newspapers "Gene Found for Schizophrenia" or some such.  This made life difficult for me, as I was treated like a quack even by people who had no science background since, "it was in the newspaper."
What the authors of these studies never did (despite my suggesting this on more than one occasion) is a control.  If you took the results of all the participants, and randomized them into two groups without regard to the disorder, you would likely get a similar number of outliers.  This would have tipped them off that they were really dealing with random data.  Instead, they pressed on and when they couldn't consistently replicate their findings, they used the next ploy:  Meta-analysis.  I'll discuss that in my next post.

No comments:

Post a Comment