Nucleotide diversity - π


Nucleotide diversity (π) measures the mean nucleotide differences per site between two randomly chosen sequences from a population. In simpler terms, it is the probability of two alleles being different at a given nucleotide.

The formula that represents π is the following:



where:

n is the number of sequences or individuals in the population,

dij is the number of nucleotide differences between sequences i and j,

L is the alignment length,

(n 2) is the number of possible pairs of sequences.


Nucleotide diversity values of 0.01-0.05 are considered high (found in Drosophila, for example), while 0.001 is considered low (found in humans).


Relationship between π and natural selection:

1. Neutral regions:

  • π remains at its baseline level, determined mainly by the mutation rate (μ) and effective population size (Ne):

  • In the absence of selection, π reflects only the balance between mutation introducing variation and drift removing it.

  • Different neutral loci in the genome should show similar π values (apart from stochastic variation). 


2. Purifying (negative) selection:

  • Purifying selection removes deleterious alleles from the population.

  • As a result, genetic variation is reduced near functional or constrained sites.

  • π decreases because many new mutations are selected against before they reach high frequency.


3. Positive (directional) selection / selective sweep

  • When a beneficial mutation rapidly rises to fixation, linked neutral variation is lost through genetic hitchhiking.

  • This causes a sharp local reduction in π around the selected site.

  • After fixation, π may gradually recover as new mutations accumulate.

  • Genome-wide scans often detect such regions as “valleys” or “troughs” of π.


4. Balancing selection

  • Balancing selection maintains multiple alleles over long periods.

  • This increases the average number of pairwise differences.

  • π increases, often forming local peaks of diversity.