Thursday, June 7, 2018

Another GWAS meta-analysis that suggests replications when the opposite is the case

"Since the discovery of general cognitive ability (or ‘g’) in 1904..."  (When I read a sentence like this, I am already a bit leery of what will come next.)

This critique is for the following study: 

Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function.  Gail Davies, et al.

This study is a meta-analysis of several studies I have already critiqued, here, here and here, as well as some new cohorts.  If I understand correctly, the new ones are the CHARGE and COGENT cohorts, but  in any case, it does not appear that any of the new datasets were ever studied independently related to cognitive ability and were simply added to the N of the meta-analyis.  I have a problem with this, which hopefully will become obvious as you read this critique. 
Briefly, that problem is the fact that this methodology makes it impossible to replicate findings from the previous studies used in the meta-analysis.  I understand that the authors wish to increase their sample size, but I have to question why you would use an approach like this?  You could even do an independent analysis of the new data set first, assess whether you have replicated anything, and then do your meta-analysis of all the cohorts.  It feels like they skipped a step and I would like one of the authors to explain why it was done in this way, because one might argue that it was to avoid dealing with a lack of replication.

The last study I looked at used a technique called MTAG, which I mocked in my critique of the study.  This one does not.  Here's what they say about MTAG in this study:

The MTAG (multi-trait analysis of genome-wide association studies) method has been used to corral cognitive function and associated traits to expand the number of loci associated with general cognitive function. However, the present study uses only cognitive function phenotypes, and amasses a total sample size of over 300,000.
So a technique that was used to "corral" different traits for the express purpose of expanding the number of loci has already been abandoned after its first use.  Is that because the authors have an issue with MTAG?  If so, does that invalidate the previous study that used this technique?  Again, we are at a point where each GWAS uses different methodology for some rather important aspects.  Where does replication come into play in such a situation?  Can you replicate an entirely different kind of study method that you appear to be expressing doubts about?

This study also looks at "reaction time" as they point out here:
The present study also tests for genetic contributions to reaction time, and examines its genetic relationship with general cognitive function. Reaction time is both phenotypically and genetically correlated with general cognitive function, and accounts for some of its association with health. By making these comparisons between general cognitive function and reaction time, we identify regions of the genome that have a shared correlation with general cognitive function and more elementary cognitive tasks.
Here's a simple question:  Why would you add this into your study?  It hasn't been used in previous studies.  It seems like an entirely different basis for a separate study.  Why not stick to cognitive ability? 

Let's take a look at the study.  This study involves a meta-analysis with differing tests used in the various cohorts to asses cognitive function, so they make a case that each is highly correlated.  I'm not going to address that aspect of the study and will assume that they have a high enough correlation that it is not going to be a problematic aspect of the study (I reserve the right to reassess at a later date...).

Now, let's look at the stated results:
A comparison of these 148 loci with results from the largest previous GWASs of cognitive function, and educational attainment, and an MTAG analysis of cognitive function—all of which included a subsample of individuals contributing to the present study—confirmed that 11 of 18, 24 of 74, and 89 of 187 of these were, respectively, genome-wide significant in the present study 
If you glossed over this, you might think that they have replicated somewhere around half the significant loci from previous studies.  That is if you don't notice the sentence, "-all of which included a subsample of individuals contributing to the present study-".  Of course you would expect many of the significant loci from previous studies to be seen in the current study - because they are included in this study.  How many of the loci might you expect to retain significance when you combine them with a new dataset looking at the same trait?  Well, that's a good question, isn't it?  In fact, if you assume that these are not random to begin with, wouldn't you expect almost all of them to be significant in the expanded study?  The new study data set should also point to significance, or at least close to it, for these same loci and would be bolstered further by the previous results.  In fact, we are getting almost the opposite effect.  Adding new data to our previous significant loci actually cuts the number of previously significant loci by half or more.  What this suggests to me is that the previous data were, in fact, false positives.  They lose their significance and go towards random just as more flips of a coin will get us closer to 50-50.   There is really no excuse for not evaluating the new samples independently and comparing the significant loci to those found in previous studies.

Now, let's parse the next sentence:
Of the 148 loci found in the present study, 58 have not been reported previously in other GWA studies of cognitive function or educational attainment
So, in addition to the fact that adding more data watered down the number of significant loci, it also added 58 new significant loci.  Again, we have a significant problem here.  Why would we suddenly find 58 more loci that haven't been found before?  I assume the authors would argue that it is because of the larger data set being used.  However, we seem to have lost more significant loci than we have gained in total by using a larger N.   That doesn't really make sense.  If these results were mostly valid, you would expect our previous results to be mostly confirmed with perhaps a few more loci found from the new dataset.  Instead, we keep finding more and more new loci and fail to replicate the old ones.
I will posit another explanation for this puzzling result:  The significant loci from the previous studies were largely random false positives that lost their significance as we added more data, while simultaneously adding new false positives (which one might expect will also be watered down by the next study in this shell game).  Now, come on.  You have to admit what I'm saying is plausible, whether or not it goes against what you want the data to show.

As with most GWAS studies like this, there is then an attempt to correlate the SNP's within these loci to other related traits and specific functionality.  Let's start with this:
For the 434 independent significant SNPs and tagged SNPs, a summary of previous SNP associations is listed in Supplementary Data 5. They have been associated with many physical (e.g., BMI, height, weight), medical (e.g., lung cancer, Crohn’s disease, blood pressure), and psychiatric (e.g., bipolar disorder, schizophrenia, autism) traits. Of the 58 new loci, we highlight previous associations with schizophrenia (2 loci), Alzheimer’s disease (1 locus), and Parkinson’s disease (1 locus).
Whenever I see something like this, my first question is what these SNP's DIDN'T correlate with for comparison's sake.  Does anyone really believe that most of these correlations above have any real relevance to cognitive ability?  Not much new seems to have been found from the 58 new correlations.    Not a lot to see here.  it looks mostly random in my view.  If you can't replicate your cognitive ability SNP's from study to study, why on earth would you think that you could correlate them to Crohn's disease or lung cancer?

Next, there is an attempt to find gene-based associations to cognitive function.  They lay this out like this:
A gene-based association analysis identified 709 genes as significantly associated with general cognitive function. These 709 genes were compared to gene-based associations from previous studies of general cognitive function and educational attainment; 418 were replicated in the present study, and 291 were novel. 
For starters, I really wish they would stop using the term replication inappropriately.  If you peruse this quickly, you might think that 418 of the 709 genes were replicated, which would be somewhat impressive.  However, we have the same problem that we did with significant loci.  They are comparing many of the same subsamples used in the previous studies and only adding on more.  Of course you should have some replication, since these bolster your new data at the outset.  So another way to say this is that adding a new subsample to the previous studies and doing a meta-analysis, has watered down 291 of the previous genes below significance, while providing (I swear I'm not making this up) 291 new gene-based associations.   In my view, this is very suggestive once again of random false positives.   But let's say you still want to believe that you are getting something better than false positives.  I have a few questions for you:
1. Can we say that the 291 genes that fell below significance have now been ruled out as related to cognitive ability?  If not, why?
2. Do you find it surprising that, 291 new genes were found replacing the 291 that fell below significance?
3. Does it concern you that so many genes that were as recently as a few months ago lauded as part of the function of cognitive ability, have already been refuted?  Do you think you could be sending well-meaning scientists down some wasted pathways by lauding these studies before they are truly replicated?
4. Since the fact that some of these refuted gene associations had brain related functions was used as evidence for their validity, do you think perhaps many genes have brain related functions and this is not a valid method for making a case for validity and could just be random?
5. Do you have any concern that your next round of studies will refute the 291 new genes you just found and whose functions you are lauding in this paper?
6. I will ask again why you don't first do an actual independent study of cognitive ability for the new dataset, which you could then compare to the previous studies (whether that be through a meta-analysis or some other method of comparison).



Really, I could go on with more questions, but I'll save them for any of the authors (or anyone) who wants to make a case for the validity of this result.

The next part of the study was basically a new GWAS related to reaction time.  At this point, I am not going to address this aspect of this study, because I believe it is of little relevance to the study above and, in my view, does not bolster the cognitive ability portion of the study in any meaningful way.  I might critique it independently later.

Next, I will address a couple of things from the discussion section.  First this:
Both the overall size of the present study’s meta-analysis of GWASs and the inclusion of a single large sample, UK Biobank, are strengths, which contributed to the abundance of new findings. When compared to an analysis of only UK Biobank herein, the current meta-analysis adds 92 independent significant loci, 51 of which are novel. Yet, as genome-wide studies of other complex traits continue to increase up to and beyond a million individuals, an even larger sample size will be required in order to seek replication of these findings, identify new associations, and generate stronger polygenic predictions
Let me point out again that increasing the sample size did not produce significantly more significant loci, since since significance didn't hold for a similar number of previously significant loci.   The authors, in my view, are describing a shell game, in which there is a continued drive to increase the sample size, find new loci, which are then replaced in the next study.  Moreover, I object to the idea that there is an accumulation of significant loci when many of the previous ones have lost their significance.   Remove the refuted loci from the GWAS catalog!

On a personal note, I would like to address one other thing noted in this study before concluding:
Finally, it is also possible that, although specific loci reached genome-wide significance in particular studies, there are false positives, highlighting the importance of well-powered replication studies.
I have been saying this for some time, and while I certainly have a higher estimate of the number of false positives, I was called a "crank" by one of the authors of this study for making a claim that is not ruled out as a possibility here and which he certainly knew would be stated in the discussion section of his own study.

To conclude:
1. My previous objection to the MTAG methodology appears to be confirmed.
2. This meta-analysis should not have been performed before assessing the new samples independently for cognitive ability.
3. Even with the results being bolstered by previous studies with significant p values, many of these loci lost their significance with the addition of other cohorts.  This suggests that they were false positives.
4. The same could be said for the genetic associations noted.
5. The alleged correlations to other traits assumes that the current loci are not false positives and those for the other traits are not false positives.  This study gives me no reason to make that assumption and, for the reasons stated above, leads me to believe the opposite related to cognitive ability genetic associations.
6.  The new loci reaching significance have not been established as anything more than false positives.
7.  This is an endless shell game.  Larger and larger studies will find new loci as they fail to replicate old ones.
8. I ask the authors to check all of these individual studies against a randomized control, as I have suggested on many occasions (my personal suggestion here).






No comments:

Post a Comment