Courses

Master the Command Line: From FASTQ to VCF for NGS Analysis

This course is a complete hands-on guide to processing real next-generation sequencing (NGS) data from raw FASTQ files to final VCF variant calls - all using command-line tools in a Linux environment.

You will learn to install and use essential bioinformatics tools such as fastqc, fastp, bwa, samtools, and bcftools. These tools are the foundation of most modern NGS pipelines used in genomics research. If you're a Windows user, no problem - we’ll show you how to set up WSL (Windows Subsystem for Linux), so you can follow every step directly from your own machine.

The course is structured around short, focused lessons. Each one walks you through a specific task in the sequencing data pipeline: downloading data from NCBI’s SRA, performing quality control checks, trimming low-quality reads and adapters, aligning reads to a reference genome, processing alignment files, and calling SNPs and indels to generate clean, filtered VCF files.

This course is ideal for beginners and intermediate users alike - whether you’re a student, researcher, or bioinformatics enthusiast. You don’t need any prior experience with Linux or the command line. By the end of the course, you’ll have a complete working pipeline and the confidence to analyze real NGS datasets on your own.

What you’ll learn?

Download and extract raw sequencing data from the NCBI Short Read Archive using command-line tools
Assess and improve the quality of FASTQ files using FastQC and fastp
Align sequencing reads to a reference genome with BWA, and process SAM/BAM files using samtools
Call and filter genomic variants (VCF) using bcftools, and understand how to interpret the results
Organize NGS analysis projects in a clean directory structure for reproducibility and clarity
Understand the structure of FASTQ, SAM, and VCF files and extract meaningful information from each format
Use standard Linux command-line tools to manipulate large genomic files efficiently

Are there any course requirements or prerequisites?

No prior experience with bioinformatics is required
Basic familiarity with the Linux terminal is helpful, but not mandatory — key commands are explained step by step
An internet connection is required to download sequencing data and reference files
A computer with at least 4 GB RAM is recommended for smoother performance during alignment and variant calling

Who this course is for?

Students, researchers, and lab technicians who want to learn how to analyze NGS data from FASTQ to VCF using command-line tools
Biologists and geneticists with no programming background who need a practical, step-by-step introduction to genomic data analysis
Bioinformatics beginners looking to understand how tools like fastqc, fastp, bwa, samtools, and bcftools work together in a complete pipeline
Anyone who wants to build a reproducible and efficient workflow for variant calling using only free and open-source tools

Search This Website

LMG_BIO

Courses

Master the Command Line: From FASTQ to VCF for NGS Analysis

Comments

Popular posts from this blog

Welcome!