How to interpret Tajima's D
🎓 Start now on Udemy → Click here
Tajima’s D is a widely used statistic in population genetics. In simple terms, it quantifies how much genetic diversity deviates from what would be expected under a neutral model of evolution. The formula is:
Ï€ is the nucleotide diversity. It is the count of differences of nucleotides per pair of sequences, and averages it over all the pairs and all sites.
θ is the number of segregating sites – number of
positions in the alignment that have variation – normalized by a factor that
depends on the sample size.
Tajima’s D variation under population events:
During a population bottleneck, many rare variants are lost by genetic drift. This results in a reduction in the number of segregating sites – Watterson’s θ decreases. But the variants that survive are in a intermediate frequency, so Ï€ does not decrease as much as θ, so Tajima’s D > 0.
During a population expansion, the number of individuals
increases. Many recent mutations result in a increase on segregating sites (Watterson’s
θ). Yet, new
mutations are in low frequency, so Ï€ is low. Therefore, Tajima’s D < 0.
Tajima’s D variation under selection:
Purifying Selection eliminates deleterious mutations, resulting in a reduction in Ï€ and stabilization of θ, so Ï€ < θ and Tajima’s D < 0.
Positive selection creates a selective sweep, where variation is wiped out near a beneficial mutation, reducing Ï€ and stabilizing θ, so Ï€ < θ and Tajima’s D < 0.
Balancing selection keeps
multiple alleles in a population, increasing diversity (Ï€) with an enrichment
of variants in intermediary frequency, so π is increased and θ stays the same,
so Tajima’s D > 0.
🎓 New Online Course
Mastering Command Line Tools for NGS Data Analysis
From raw FASTQ files to clean VCF - all using free tools like fastqc, fastp, bwa, samtools, and bcftools.
No Linux experience required - works even on Windows (WSL).
👉 Start the Course on Udemy
Comments