Thursday, August 29, 2019

Here Come the Gay Genes

This is a new GWAS related to homosexuality or "same-sex, sexual behavior" as they describe it. Before I even begin this critique, I'll point out the obvious-to-ask question from any such claim, as noted in the study:
We observed in the UK Biobank that individuals who reported same-sex sexual behavior had on average fewer offspring than those of individuals who engaged exclusively in heterosexual behavior... This reproductive deficit raises questions about the evolutionary maintenance of the trait, but we do not address these here.
Yes, it certainly does raise that question. I understand that a thousand explanations have been thrown out for this, none of which has any evidence to back it, and suggest the possibility that we are working with an absurd premise.
Now let's go back to the description of "same-sex, sexual behavior." By what  criteria do we group some kid who, say, had one same-sex sexual experience at summer camp with someone who identifies as homosexual. You would group such and individual as "non-heterosexual"? That seems a stretch and ignores their self-identification. Is there some causal relationship between these two things, genetic or not? I don't think anyone could realistically group these individuals together for a study of this nature. I question even doing a study of this nature, but I think that one would want to first identify individuals that are clearly homosexual rather than confound the study with what might largely be a culturally related experimentation phase.
Once again, I might add, the UK BioBank has been noted in a few recent studies to be replete with population stratification issues, notably related to age and it is noted that young people who participate have higher rates of a same-sex experience. I won't get into too much detail along these lines other than to suggest that such population stratification could easily account for the few significant loci that were found. Now, let's look at some of the numbers and the dubious claims of replication behind the fold:
This study claims 5 loci significant (using the standard, p <5 X 10-8).The first thing you'd generally expect in assessing such a result is an attempt to compare it to previous studies for the same trait. There have been three, according to this study. They dismiss them out of hand, stating that their n was too small. In fairness, two of the studies were from the 1990's, but the third is a small n study from 2017. (1,000 case and 1,200 controls). This study found 1 loci of significance (when combined with a previous study) and several "almost" significant (in the 10-7 range). None of these, I assume, were replicated in this study, so one might think that this study, first and foremost, failed to replicate the previous study. As noted, they simply dismissed the study and one might ask whether they would have dismissed the study had they replicated its results?
Instead, they use a separate dataset to claim replication of three of the five loci from their study. Let me quote the claimed replication:
Overall, three of the SNPs replicated at a nominal P value in the meta-analyzed replication datasets (Wald test P = 0.027 for rs34730029, P = 0.003 for rs28371400, and P = 0.006 for rs11114975) (table S10), despite the much smaller sample size (MGSOSO, Add Health, and CATSS; total sample size = 15,156 individuals, effective sample size = 4887 individuals)
So, although this was a larger dataset than the one they dismissed, they are accepting a p value of P = .027 as a replication along with .003 and .006. These, of course, do not reach the standard threshold of P< 5X10-8 by orders of magnitude and are far worse than those in the study they discarded. I would ask the authors why these are accepted as replications?
Editor's Note: I am being taken to task on Twitter for comparing these p values to a full genome GWAS. Fair point, but nonetheless, this is an easier "replication" standard than doing another GWAS and getting the same significant loci at 5<10-8. I stand by the fact that this is a shell game and any new, independent study will generally not actually replicate the significant loci for this or most any study.

Let's be frank here. We have a study that loosely (and dubiously) defines a trait and found a few significant loci. That happens with studies for just about any trait when you have a high enough N,  including such things as ice cream preferences and church attendance. The loci are novel, they have not been replicated and they did not match those from the previous study.  Why does such a study get so much media attention? Probably the homosexuality issue gets clicks on social media, but also an all-out media blitz from the authors appears to have driven much of this. In any case, it is not due to the actual success of the study, which of course, will quickly be filtered in the psyche of the general populace to "They found the gay genes."

It is the same formula now, over and over again. Get a big N, find a few significant loci, compare them to another dataset and claim replication with watered down P values that don't reach significance.






No comments:

Post a Comment