Posts

Mendel's Laws of Inheritance

In the nineteenth century, there were several theories that attempted to explain the heredity of characteristics. Lamark proposed the inheritance of acquired characteristics , where traits are modified by use or disuse during individual’s lifetime and inherited by offspring. On the other hand, Darwin theorized the blending inheritance , where hereditary material of parents “mixes” in offspring, giving intermediate phenotypes and predicting erosion of variation over generations. Mendel proposed that traits are determined by discrete hereditary factors (that we now call alleles), transmitted intact across generations (particulate inheritance). This explains two observations: - Why recessive phenotypes can disappear in one generation and reappear in later ones (alleles persist in heterozygotes); - Why offspring phenotypes can be either identical or different from parents. Neither inheritance of acquired characteristics nor blending inheritance explain these observations. Nowada...

FST - Fixation Index - What is it? Interpretation and Examples

Image
FST (fixation index) measures how much of the total genetic variation is due to differences between populations. It is a measure of genetic differentiation between populations. Its mathematical value varies between 0 (genetically identical populations)  and 1 (totally genetically differentiated populations). The classic formula to define FST is: where: H T is the expected heterozygosity (if all populations were one) H S is the mean heterozygosity within populations Relation with migration: In simple model in equilibrium between migration and drift, FST has the following value: where N m is the number of migrants per generation . So, if the number of migrants is high (high gene flow), FST tends to zero. The opposite occurs for isolation. Relation with genetic variance: A third way of defining FST is the most straightforward, using Wright's formula (as we'll see next): where: Var( p ) is the  variance of allele frequencies among populations p is the  mean allele fre...

Nucleotide diversity - π

Image
Nucleotide diversity (π) measures the mean nucleotide differences per site between two randomly chosen sequences from a population. In simpler terms, it is the probability of two alleles being different at a given nucleotide. The formula that represents π is the following: where: n is the number of sequences or individuals in the population, d ij is the number of nucleotide differences between sequences i and j , L is the alignment length, ( n 2) is the number of possible pairs of sequences. Nucleotide diversity values of 0.01-0.05 are considered high (found in Drosophila , for example), while 0.001 is considered low (found in humans). Relationship between π and natural selection: 1. Neutral regions: π remains at its baseline level , determined mainly by the mutation rate (μ) and effective population size (Ne): In the absence of selection, π reflects only the balance between mutation introducing variation and drift removing it. Different neutral loci in the genome should show s...

From SRA to FASTQ - How to Download NGS Data - Using prefetch SRA and fastq-dump for Sequencing Reads

Image
  For every publication where Next-generation-sequencing data was obtained, that data was uploaded to NCBI‘s Short Read Archive (SRA). This share opened the possibility for other scientists to test the data, learn with that data, or use it in their own studies to search for other conclusions. One can obtain all SRA available on  https://www.ncbi.nlm.nih.gov/sra . Here, you can search for a specific SRA, or a species’ whole genomic sequencing data or even RNA-seq data. After you choose the SRA, you need to obtain the access ID (SRR), which is easily found in the SRA page you selected. NCBI offered a command-line toolkit that allowed users to interact with the database and each SRA itself – the SRA Toolkit. It can be installed by running the following commands: $ sudo apt update $ sudo apt install sra-toolkit The two most used sra-toolkit commands are  prefetch  and  fastq-dump . The  prefetch  command is used to download the compressed archives from SRA...

How to obtain an Admixture bar plot using ANGSD (ngsTools)

 Admixture bar plots are used to visualize the genetic structure of populations by assigning proportions of an individual's genome to different ancestral populations (K). Here are some key concepts: Ancestral Populations (K): The number of distinct genetic populations assumed in the analysis. Users must choose a value for K, which represents the number of ancestral populations. Individuals: Each individual in the dataset is represented as a vertical bar in the plot. Ancestry Proportions: The colors within each bar indicate the proportion of an individual's genome that comes from each ancestral population. To obtain an Admixture bar plot, you firstly need to install ngsTools (instructions here). This software uses genotype likelihoods rather than hard genotype calls. The analysis is based on BAM files. Sample data used in this tutorial can be downloaded  here . Admixture proportions can be estimated from genotype likelihoods using ngsTools.  Here  are instructions for...

How to plot a PCA from BAM files using ngsTools

Image
Principal Component Analysis (PCA) is a statistical method used to reduce the dimensions of large datasets, increasing interpretability while minimizing information loss. When applied to BAM files, PCA can help identify patterns of genetic variation across samples. By transforming complex, multidimensional data from BAM files into a set of orthogonal (independent) variables known as principal components, you can more easily discern genetic similarities and differences among samples. This process is particularly useful in population genomics to study genetic diversity and population structure. To plot a PCA based on BAM files, you firstly need to install ngsTools (you can follow the instructions  here ). In the following tutorial we will rely on genotype likelihoods rather than hard genotype calls. Genotype likelihoods have the advantage of incorporating the uncertainty associated with genotype calls from sequence data, allowing for more accurate and robust genetic analyses. Th...

How to install ngsTools (2026)

To install ngsTools, you must first install several dependency libraries and software packages: $ sudo apt update $ sudo apt install git gsl-bin libgsl-dbg libgsl-dev libgslcblas0 gcc zlib1g-dev libbz2-dev liblzma-dev libcurl4-openssl-dev coreutils samtools perl r-base g++-9 $ sudo cpan Getopt::Long && sudo cpan Graph::Easy && sudo cpan Math::BigFloat && sudo cpan IO::Zlib $ sudo R -e "install.packages(c('optparse', 'tools', 'ggplot2', 'reshape2', 'plyr', 'gtools', 'LDheatmap', 'ape', 'grid', 'methods', 'phangorn', 'plot3D'))" Now you can install ngsTools. Ensure that your terminal is directed to the folder where you want ngsTools to be installed: $ git clone --recursive https://github.com/mfumagalli/ngsTools.git $ cd ngsTools $ sudo make CXX=g++-9