Monday, April 16, 2018

GWAS/Meta-Analysis 78,308 (Sniekers): A critique of this study claiming numerous genetic correlations to intelligence

Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence

 I had someone cite this study as strong evidence for specific genetic linkages to intelligence.  I'm getting a lot of, "A lot has changed in the field lately," types of comments, which are maybe a bit condescending, but not entirely off base.  To be honest, I'm surprised at how little change there has been since I was critiquing these studies 15 to 20 years ago.  They still jump to the same conclusions.  They still have no sense that their studies are producing false positives.  Some of the terminology has changed and, really, they are a bit overly technical and harder to read (or I'm just getting older).  Despite the fact that I disagree with the conclusions of this study, as I will lay out shortly, there is something to be said for dumbing down your points a bit, unless your goal is a bit of technical obfuscation (not accusing, just saying...).

For starters, it should be noted that this study is not a genome-wide association (GWAS).  It takes a meta-analysis of several GWAS for intelligence and then compares it to another meta-analysis, this one for educational achievement which, they say, "Hey, close enough." They then try to pass off common findings as a replication, which they refer to as "proxy-replication," an interesting term if I've ever heard one.  It doesn't appear that any actual lab-work was performed in this study.  I should also mention that there are 78,308 individuals in the meta-analysis, because this was of such importance to the authors that they seemed to go out of their way to repeat it several times.  I've mentioned previously that both GWAS and Meta-analyses are prone to false positives and it is worth reading those posts here and here, before getting into this specific study, since I will be making some similar points.
The first point, before I really even get started, is that this is another example of a study in which a simple control could be used, something I pushed for with these types of studies many years ago.  In this case, that would involve taking the GWAS studies they are doing this meta-analysis for, and instead of dividing them up by IQ, divide them into two random groups and run the same analysis.  This would give us a sense of a false positive rate to compare the data.  Otherwise, we have no idea whether the findings are just cranking out random data (which is my assumption).
Another interesting aspect of this study, is that it does not appear to compare the original results of the different studies to the their findings.  In other words, they took raw data from previous studies, combined them and looked for significance independently of the individual study findings (unless I'm reading it wrongly).  I don't know why they wouldn't show a direct comparison, but perhaps they have it in supplemental material that is not available to me.   As yet, I have not been able to track down specific links to the studies they use, but will add an addendum to this if I do and can compare results.  I don't even know whether the previous  studies (there was also one that hasn't been published) have any consistency between each other.  Assuming that different loci are flagged in the current study, you have a double problem: 1) Why didn't the initial studies pick up these associations? 2) If the associations in the previous studies were not picked up in the combined study, that proves that they were false positives, correct?  Therefore, it is likely that this study eliminated many of the claimed positive associations from the initial studies it looked at.  So why wouldn't we assume that any new positive correlations are also false positives?
For the record, I don't really take issue with the fact that some of the studies were for children and some for adults.  I do, however, take issue with the "proxy-replication"  for three reasons.  The first is the obvious fact that, while there is certainly some correlation between IQ and educational attainment, it waters down the findings you are attempting to "replicate."  Secondly,  even if it was another  GWAS based directly on intelligence, I object to the term "replicate".  No two GWAS that I have ever heard of replicate completely.  Not even close.  Thirdly, even if you are going to use the term "replicate" in such an instance, I don't understand why, when they had an entire unpublished study specifically related to intelligence, that they didn't use that as their point of "replication" for the other GWAS studies on which they were doing a replication.
So, now let's look at some of the specific results.  I'm going to start with what I think is an interesting quote from this study:
"The meta-analysis identified 18 independent genome-wide significant loci (Figs. 1 and 2a, and Table 1), including 336 top SNPs (below the genome-wide threshold of significance; Supplementary Table 4). Of the 18 identified loci, 3 have been implicated in intelligence previously: "
Another way to say this (my way):  "Most of the loci identified in this study as having genome-wide significance, had not previously been linked to intelligence in other studies (apparently including the ones being looked at).  Only 3 of the 18 loci have been linked to intelligence in the past."

What the hell?  This should make us a little bit wary of any data we gather from here.  Is it a stretch to consider the possibility that the 15 new loci are false positives?  Is it a stretch to say that the small number of loci that have again been implicated (3)  suggests that they too are random.  Again, if one used a control meta-analysis as I have suggested above,  how many "new" loci would have significance and how many would have been previously implicated?  I'm guessing the 15 and 3 numbers would not be that far off.  In which case, you might have a pretty good indication that all 18 of these significant loci are false positives.  (Addendum:  I have now suggested a more specific version of this test for random false positives, which I refer to as The Pittelli Test).

Next they say this:
"The top SNPs implicated 22 genes, of which 11 were new."  
So, half the genes they found were new and half were found previously.  Why would so many "new" genes pop up?  To be fair, 11 out of 22 isn't bad and might even sound significant.. except for the fact that they are likely looking at many of the same studies that garnered the 11 previously implicated genes.  Again, you might just have random data here, bolstered by the fact that probably the same studies that produced the initial genes,  are used in this study.  Granted, the meta-analysis increased the data-set, but the studies being used weren't exactly small.

the next part of the study is not something I am not overly familiar with, but I will quote here:
"Apart from a SNP-by-SNP GWAS, we conducted a genome-wide gene association analysis (GWGAS) as implemented in MAGMA17 (Online Methods). GWGAS relies on converging evidence from multiple genetic variants in the same gene and can yield novel genomewide significant signals on a gene-based level that are not necessarily picked up by a standard GWAS. The GWGAS identified 47 associated genes (Fig. 3a and Supplementary Table 8). The GWGAS and GWAS identified 17 overlapping genes; thus, the total number of genes implicated either by a SNP hit or by GWGAS was 22 + 47 – 17 = 52. Twelve of the 52 genes have been associated with intelligence previously."
It looks like they just pulled another 30 genes out of thin air, but I am willing to be educated on this.  In any case, 40 out of 52 of these genes have not previously been implicated for intelligence.  Again, even 12 out of 52 previously identified would not have been bad, but we still have the problem that "implicated previously" might very well mean implicated in some of the same studies we are looking at now.

At this point in the paper, if you're still buying these implicated genes as something other than false positives, they try to bolster their case by pointing out that the specific genes in question are more favorably connected to brain function:
"Tissue expression analyses (Online Methods) of the 52 genes using the GTEx data resource showed that 14 of 44 genes for which GTEx data were available were more strongly expressed in the brain than in other tissues "
It's worth noting that they say "more strongly expressed."  So even these 14 genes are expressed in areas other than brain tissue.  Is that a particularly high percentage?  I don't really know, but would be happy to get input from others and will include it here.  From my end at this point, though, I will say that 14 of the 52 false positive genes are noted to be more strongly expressed in the brain.

The next part of the study, which I have already expressed reservations about above, is the "proxy-replication."  Ironically, I think it is statistically the strongest part of the study.   I'll move right to the meat of it.
"Of the 47 genes that were significantly associated with intelligence in the GWGAS, 15 were also significantly associated with educational attainment (P < 0.05/47; Supplementary Table 15). Given the high (0.70) but not perfect genetic correlation between educational attainment and intelligence, these results strongly support the involvement of the proxy-replicated SNPs and genes in intelligence."
This is still less than a third of the the genes that are correlated in both.  I would hardly call that a "replication", or a proxy-replication as the case may be, but I will admit that, on it's face, that seems a bit better than random.

In conclusion, if you take the study as a whole, it makes for a decent screening study that works as a platform for studying individual genes or loci for a correlation.  My contention is that the correlations we see here are little better than what you would see at random (again, anyone researcher reading this, please consider my control suggestion above).  The fact that you continue to find MOSTLY "new" genes or loci each time a genome wide association is performed (or in this case, re-performed), should be a tip off that you are probably generating false positives.





No comments:

Post a Comment