Variant frequency, effect size and GWAS limitation

Ghost
Sep 4, 2017
2 min read

Figure below very well explains two important attributes of a variant: population frequency (to avoid confusion, we use "population frequency" instead of "allele frequency".) and effect size. Evolution dictates that variant's population frequency is generally inversely proportional to its effect size.

To detect significant association between a variant and specific phenotype, the variant has to meet either one of two prerequisites:

The variant has to have at least modest population frequency.
The variant has to have at least modest effect size.

For the most simple case, variants may have high population frequency and high effect size at the same time (upper-right). One example would be APOE4: a high penetrance variant with modest population contributes to alzheimer's disease. However, it is biologically reasonable to assume that only a few variants may belong to this category because of purifying selection.

On the other hand, Rare variants with small effect size (lower-left) are very hard to detect with GWAS for the reason mention earlier. But we usually don't put much attention on them anyway as they contribute to the phenotype so little.

Now, the challenge is identifying those between the parallel dash in figure above. GWAS works best with common variants (lower-right). however, variants identified by this approach only have modest effect size and can not fully account for the phenotype susceptibility.

Rare variants (upper-left) are hard to be detected by this technique because:

Genotype chips, as the common GWAS platforms, is based on linkage disequilibrium. Very rare mutations are usually not included in the chip. BeadChip offers the most optimal and comprehensive set of both common and rare SNP content from the 1kGP (MAF>2.5%) for diverse world populations. However, such MAF is still too high for many rare variants. NGS can solve such limitation.
For variant with very low population frequency, it is difficult to find enough cases and get significant association. To identify a variant with population frequency of 1% and penetrance of 1% using Chi-square test would require 20,000 cases and 20,000 controls for 80% power. For variants with population frequency of 0.5%, it is almost impossible to get sufficient sample size.

To detect such rare variants, linkage study is probably a better alternative as long as you have pedigree and the penetrance is high , for example BRCA to breast cancer.

Missing heritability issue

For the vast majority of complex traits, < 10% of the genetic variance is explained by common variants so far. This missing heritability can be explained by:

multi-gene interaction
gene-environmental interaction
structural variation
rare mutation
phenotypic robustness

The power of biological replicates in statistical analysis

MCMC II: Applying MCMC in somatic variant calling

MCMC: Monte Carlo sampling and Markov Chain

Variant frequency, effect size and GWAS limitation

Comments