Comparative genetic architectures of schizophrenia in East Asian and European populationsI tried to ask a few questions to one of the authors promoting it on Twitter, but he did not respond, so if I am incorrect about any fact, leave a comment here and I will update. Let's start with the Abstract, which is below in full:
Schizophrenia is a debilitating psychiatric disorder with approximately 1% lifetime risk globally. Large-scale schizophrenia genetic studies have reported primarily on European ancestry samples, potentially missing important biological insights. Here, we report the largest study to date of East Asian participants (22,778 schizophrenia cases and 35,362 controls), identifying 21 genome-wide-significant associations in 19 genetic loci. Common genetic variants that confer risk for schizophrenia have highly similar effects between East Asian and European ancestries (genetic correlation = 0.98 ± 0.03), indicating that the genetic basis of schizophrenia and its biology are broadly shared across populations. A fixed-effect meta-analysis including individuals from East Asian and European ancestries identified 208 significant associations in 176 genetic loci (53 novel). Trans-ancestry fine-mapping reduced the sets of candidate causal variants in 44 loci. Polygenic risk scores had reduced performance when transferred across ancestries, highlighting the importance of including sufficient samples of major ancestral groups to ensure their generalizability across populations.
The reason I am showing the entire abstract is to point out what it doesn't say: That the study apparently failed to replicate any of the previous significant loci for schizophrenia (as far as I can tell). The authors simply ignore this, almost as if it is expected, yet expend a lot of time trying to make lemonade out of a lemon without telling us it was a lemon, trying to justify why schizophrenia would present in the same way in different cultures, when it is presumably due to entirely different gene sets.
In my view, you would not expect any of the loci to match between the two studies because the loci are generally false positives, probably enhanced by population stratification issues that are going to be different in these two different populations. Let me go over some of the findings and why I believe they are consistent with pop/strat, false positives after the fold:
As I've already pointed out, none of the loci from the European ancestry study match those of the East Asian ancestry study. The authors inadvertently drive this failure home:
Most associations were characterized by marked differences in allele frequencies between the EAS and EUR samples: for 15 of 21 loci, the index variants had higher minor allele frequencies (MAFs) in EAS than EUR. The higher allele frequency potentially confers better power to detect associations in EAS.The argument they are trying to make here is that the reason that they didn't match was because the ones they did find in the East Asian ancestry study often had a lot more of the variant that is being correlated to schizophrenia than the European ancestry study. That doesn't explain the other 6 loci and ignores the fact that the European ancestry study was nearly three times the size, so you have more power to work with even if you have some lower MAF's. In fact, you have two different ancestry groups, so even if you are dealing with false positives, you expect the "significant" loci to have higher MAF's and a difference between the two ancestry groups.
Let's next look at the findings in the East Asian Ancestry study, specifically. For reasons that aren't quite clear to me, they divided the study into two groups (Stage 1 and Stage 2), with Stage 1 being about twice the size of Stage 2, then combined them with a p value given for the combined set as well as the 2 stages, as seen below:
A couple of interesting points here: First, only 1 of the 21 loci reached significance in Stage 2 alone, while 10 of them reached significance in Stage 1. This is consistent with false positives being more prevalent in a larger dataset. For some reason, they make no hay about possibly the only interesting finding of this paper, which is that the one loci that reached significance in Stage 2 also reached significance in Stage 1. To my knowledge, this is as close to an independent replication in behavioral genetics of even a single loci/SNP as has been seen. Without knowing the specific breakdown of the Stage 1 and Stage 2 data sets, and considering the fact that the p-values tend to trend in a specific direction which I believe is a good indication of pop/strat, I have some skepticism about even this but, nonetheless, I would say they missed their chance to tout an actual possible replication.
Can we stop for a moment and consider whether it makes any sense at all that an unspecified collection of SNP's, entirely different between one population and the next, are going to somehow combine to create a schizophrenia phenotype that presents in a relatively similar way and a similar frequency in the two different ancestry groups? The authors claim that it must be, since they found different genetic variants and the phenotype is basically the same, but this ignores the obvious possibility that the these are false positives and we are barking up the wrong tree.
If that is the case, then when they combine the European ancestry study with the East Asian ancestry study, one should expect that we will find a whole bunch of completely different loci, a watering down of the larger (European) sample's significant loci and a significant watering down of the smaller (East Asian) sample's significant loci. Let's roll the tape:
We identified 208 independent variants (both in EAS and EUR) associated with schizophrenia across 176 genetic loci (Fig. 2 and Supplementary Tables 5 and 6), among which 53 loci were novel (not reported in refs. 2,3,7,8 ). Of the 108 schizophrenia-associated loci reported in the previous EUR study2 , 89 remained significant in this study (Supplementary Table 4). Using simulations with a correction for winner’s curse39, we found that this was consistent with an expected overestimation of the effect sizes due to the winner’s curse in the previous study, rather than implying that the 19 loci no longer significant in this study were false positives (Supplementary Note). In addition, the deCODE samples (n=1,513 cases and n=66,236 controls) were not included in the present study, causing the power for loci that had low MAF in EAS to drop.I'm am not able to find a breakdown on the number of EAS loci that replicated in the combined EAS and EUR. I will assume that it is close to or equal to zero (I would ask the authors why they haven't put that information in and point out that the math 176 - 89 - 53 doesn't add up). In other words, despite the protestations above, we get what you would expect if these were false positives: More new false positives now that we have a larger data set, with a loss of many of the old false positives in the slightly watered down larger dataset and presumably almost all false positives from the smaller dataset, overwhelmed by the larger one. In fact, if you look at their Supplementary Table 4, you will see that the trend for those EUR loci losing significance is that their p value was higher. This is just plain consistent with false positives, sorry.
Now let's go to the polygenic risk. The assumption that these scores will improve with larger datasets is not the case here:
We assessed how much variation in schizophrenia risk can be explained in EAS using both EAS stage 1 and EUR training data. Using a standard clumping approach, we first computed PRS using a leave-one-out meta-analysis approach with EAS summary statistics (Methods), which explained ~3% of schizophrenia risk using genome-wide variants on the liability scale (R2=0.029 at P=0.5). In contrast, when EUR summary statistics were used to calculate PRS in the EAS samples, a maximum (emphasis mine) of only ~2% of schizophrenia risk was explained (R2=0.022 at P=0.1), despite a greater than threefold larger EUR effective sample size.So the prs was fairly useless even when sticking with EAS and was essentially zero when they attempted to use the EUR for prs to check the EAS risk. I think the take-home message here is that the use of different ancestral populations is going to eliminate any usefulness of prs scores. This is not, in my view, because the different ancestral groups have different genetic pathways to the same phenotype as they attempt to imply here, but because the results are false positives based on pop/strat and different populations will have entirely different pop strat issues. I think that cross ancestry will be the death knell for behavioral genetic GWAS and PRS's.
No comments:
Post a Comment