How to interpret Tajima's D
Tajima’s D is a widely used statistic in population genetics. In simple terms, it quantifies how much genetic diversity deviates from what would be expected under a neutral model of evolution. The formula is:
π is the nucleotide diversity. It is the count of differences of nucleotides per pair of sequences, and averages it over all the pairs and all sites.
θ is the number of segregating sites – number of positions in the alignment that have variation – normalized by a factor that depends on the sample size.
Tajima’s D variation under population events:
During a population bottleneck, many rare variants are lost by genetic drift. This results in a reduction in the number of segregating sites – Watterson’s θ decreases. But the variants that survive are in a intermediate frequency, so π does not decrease as much as θ, so Tajima’s D > 0.
During a population expansion, the number of individuals increases. Many recent mutations result in a increase on segregating sites (Watterson’s θ). Yet, new mutations are in low frequency, so π is low. Therefore, Tajima’s D < 0.
Tajima’s D variation under selection:
Purifying Selection eliminates deleterious mutations, resulting in a reduction in π and stabilization of θ, so π < θ and Tajima’s D < 0.
Positive selection creates a selective sweep, where variation is wiped out near a beneficial mutation, reducing π and stabilizing θ, so π < θ and Tajima’s D < 0.
Balancing selection keeps multiple alleles in a population, increasing diversity (π) with an enrichment of variants in intermediary frequency, so π is increased and θ stays the same, so Tajima’s D > 0.
Comments