Nucleotide diversity

Nucleotide diversity (π) measures the mean nucleotide differences per site between two randomly chosen sequences from a population. In simpler terms, it is the probability of two alleles being different at a given nucleotide.

The formula that represents π is the following:

where:

n is the number of sequences or individuals in the population,

dij is the number of nucleotide differences between sequences i and j,

L is the alignment length,

(n 2) is the number of possible pairs of sequences.

Nucleotide diversity values of 0.01-0.05 are considered high (found in Drosophila, for example), while 0.001 is considered low (found in humans).

Relationship between π and natural selection:

1. Neutral regions:

π remains at its baseline level, determined mainly by the mutation rate (μ) and effective population size (Ne):

In the absence of selection, π reflects only the balance between mutation introducing variation and drift removing it.

Different neutral loci in the genome should show similar π values (apart from stochastic variation).

2. Purifying (negative) selection:

Purifying selection removes deleterious alleles from the population.
As a result, genetic variation is reduced near functional or constrained sites.
π decreases because many new mutations are selected against before they reach high frequency.

3. Positive (directional) selection / selective sweep

When a beneficial mutation rapidly rises to fixation, linked neutral variation is lost through genetic hitchhiking.
This causes a sharp local reduction in π around the selected site.
After fixation, π may gradually recover as new mutations accumulate.
Genome-wide scans often detect such regions as “valleys” or “troughs” of π.

4. Balancing selection

Balancing selection maintains multiple alleles over long periods.
This increases the average number of pairwise differences.
π increases, often forming local peaks of diversity.

Search This Website

LMG_BIO