Courses

 

Master the Command Line: From FASTQ to VCF for NGS Analysis


This course is a complete hands-on guide to processing real next-generation sequencing (NGS) data from raw FASTQ files to final VCF variant calls - all using command-line tools in a Linux environment.

You will learn to install and use essential bioinformatics tools such as fastqc, fastp, bwa, samtools, and bcftools. These tools are the foundation of most modern NGS pipelines used in genomics research. If you're a Windows user, no problem - we’ll show you how to set up WSL (Windows Subsystem for Linux), so you can follow every step directly from your own machine.

The course is structured around short, focused lessons. Each one walks you through a specific task in the sequencing data pipeline: downloading data from NCBI’s SRA, performing quality control checks, trimming low-quality reads and adapters, aligning reads to a reference genome, processing alignment files, and calling SNPs and indels to generate clean, filtered VCF files.

This course is ideal for beginners and intermediate users alike - whether you’re a student, researcher, or bioinformatics enthusiast. You don’t need any prior experience with Linux or the command line. By the end of the course, you’ll have a complete working pipeline and the confidence to analyze real NGS datasets on your own.


What you’ll learn?


  • Download and extract raw sequencing data from the NCBI Short Read Archive using command-line tools
  • Assess and improve the quality of FASTQ files using FastQC and fastp
  • Align sequencing reads to a reference genome with BWA, and process SAM/BAM files using samtools
  • Call and filter genomic variants (VCF) using bcftools, and understand how to interpret the results
  • Organize NGS analysis projects in a clean directory structure for reproducibility and clarity
  • Understand the structure of FASTQ, SAM, and VCF files and extract meaningful information from each format
  • Use standard Linux command-line tools to manipulate large genomic files efficiently

Are there any course requirements or prerequisites?

  • No prior experience with bioinformatics is required
  • Basic familiarity with the Linux terminal is helpful, but not mandatory — key commands are explained step by step
  • An internet connection is required to download sequencing data and reference files
  • A computer with at least 4 GB RAM is recommended for smoother performance during alignment and variant calling

Who this course is for?


  • Students, researchers, and lab technicians who want to learn how to analyze NGS data from FASTQ to VCF using command-line tools
  • Biologists and geneticists with no programming background who need a practical, step-by-step introduction to genomic data analysis
  • Bioinformatics beginners looking to understand how tools like fastqc, fastp, bwa, samtools, and bcftools work together in a complete pipeline
  • Anyone who wants to build a reproducible and efficient workflow for variant calling using only free and open-source tools



Comments

Popular posts from this blog

Welcome!