Friday, October 19, 2018

More on the idea of "Yoking Stratification"

I posted previously about the idea of "Yoking Stratification," and that post can be viewed here.  In short, the idea is that, GWAS results can be skewed, because individuals with a particular trait often marry/mate with individuals with that same trait. The term for this is assortative mating: i.e., Tall people tend to marry tall people, people with a high BMI, tend to marry other people with a high BMI, and as one study showed, people with a mental illness tend to more frequently marry others with that same mental illness or, to a lesser extent, other mental illnesses.   This is a potential source of population stratification and I think is difficult to control for to any extent.   As I noted in the previous blog post linked above, this could therefore artificially inflate the number of SNP's found to be "significant" for the trait and the polygenic score might just be a reflection of the common genetic markers due to this selective trait mating pattern, but unrelated to the trait itself, in the same way that Ancestry.com determines your racial background via genetic markers that likely have nothing to do with being "Irish" or "Chinese."
This point was driven home recently, when a study came out  with N = 360,000 that failed to find SNP's for "left-handedness." 
This is an interesting dilemma, since left-handedness has been noted to have significant heritability in the past of 24%.  Why then, would such a large study, with something that heritable, not find a large number of genetic variants for the trait?  I think that the idea of yoking stratification, or a lack thereof in this case, is a possible explanation for this.  The point being that left-handed people (as far as I know), do not tend to choose left-handed partners to any extent.  Thus, you don't get a stratification for left-handedness and you won't get an artificially inflated number of SNP's that are unrelated to the trait.
This leads me to suggest that, along with the N for any particular GWAS, another factor that might influence the number of SNP's found would be yoking stratification.  Obviously, I don't have the ability to easily quantify this for a particular trait, but the above noted study of the "non-random mating habits" of individuals with mental illnesses, gives a helpful start.  They provide a correlation coefficient for the likelihood that a person with a particular mental illness will mate/marry someone with that same mental illness.
With that in mind, I thought it might be interesting to look at a few GWAS results for different mental illnesses and created a little spreadsheet:




Generally, when you look at the mental illnesses listed above, in addition to high N, the correlation coefficient appears to be a factor in the total number of SNP's.  Take for example, the Schizophrenia study and the depression study, which have a similar number of SNP's, despite the fact that the Depression study has 5 times the N, or that ADHD has a similar number of SNP's to substance abuse, but with half the N, but in both cases, the study with the smaller N is for a trait with a larger correlation coefficient.  This, of course, is rough and not perfectly quantifiable (I tried...), but there does seem to be some suggestion of a correlation between the coefficient and the number of SNP's.  
Looking at the "Educational Attainment" study, for which I do not have a correlation coefficient, we have a very high N and a high number of SNP's.  It would suggest to me that we would also expect a high correlation coefficient.  Highly educated people tend to marry/mate with other highly educated people at a high rate, so I think that would be borne out.  
You might argue that the number of SNP's for a particular trait will vary and that is the primary basis for the variation in the number of significant SNP's for a specific trait.  That's why I included results from the height/BMI study above.  This has a high N, although lower than that for educational attainment, but height has far more significant SNP's than educational attainment.  Does anyone want to argue that height has a more complex, polygenic pathway than educational attainment?  I would argue that height has a higher correlation coefficient, even in white, European population (Germans are taller than Italians and more likely to marry/mate within their ethnic background, etc.).  The results for BMI and educational attainment seem similar given the lower N for BMI, so perhaps those two traits have a similar correlation coefficient.
Obviously, there are a lot of confounding factors here.  GWAS studies vary in techniques, we have different populations, different methods of determining a trait, the accuracy of the correlation coefficient, etc., but I feel comfortable arguing that many of the significant SNP results might be false positives related to this phenomenon, and elevated polygenic scores might also be related to this kind of stratification.  If I'm correct, this would mean that a lot of scientific analysis is being wasted on "discoveries" that are simply not valid.



No comments:

Post a Comment