Nucleotide diversity - π
Nucleotide diversity (π) measures the mean nucleotide differences per site between two randomly chosen sequences from a population. In simpler terms, it is the probability of two alleles being different at a given nucleotide.
The formula that represents π is the following:
where:
n is the number of sequences or individuals in the population,
dij is the number of nucleotide differences between sequences i and j,
L is the alignment length,
(n 2) is the number of possible pairs of sequences.
Nucleotide diversity values of 0.01-0.05 are considered high (found in Drosophila, for example), while 0.001 is considered low (found in humans).
Relationship between π and natural selection:
1. Neutral regions:
- π remains at its baseline level, determined mainly by the mutation rate (μ) and effective population size (Ne):
- In the absence of selection, π reflects only the balance between mutation introducing variation and drift removing it.
- Different neutral loci in the genome should show similar π values (apart from stochastic variation).
-
Purifying selection removes deleterious alleles from the population.
-
As a result, genetic variation is reduced near functional or constrained sites.
-
π decreases because many new mutations are selected against before they reach high frequency.
3. Positive (directional) selection / selective sweep
-
When a beneficial mutation rapidly rises to fixation, linked neutral variation is lost through genetic hitchhiking.
-
This causes a sharp local reduction in π around the selected site.
-
After fixation, π may gradually recover as new mutations accumulate.
-
Genome-wide scans often detect such regions as “valleys” or “troughs” of π.
4. Balancing selection
-
Balancing selection maintains multiple alleles over long periods.
-
This increases the average number of pairwise differences.
-
π increases, often forming local peaks of diversity.