New Case Study: How NVIDIA Parabricks Accelerates Alzheimer's Research with STAR RNA-seq Analysis
More

How to Properly Filter Missing Genotypes in bcftools

A practical guide to filtering missing genotypes in VCF files using bcftools, with examples and best practices for handling missing data in variant calling.

const metadata = ; Introduction Missing genotypes are a common challenge in variant calling and downstream analysis. Whether you're working with whole-genome sequencing, exome data, or genotyping arrays, understanding how to properly filter missing data in bcftools is essential for maintaining data quality while preserving valuable variants. In this guide, we'll walk through the different approaches to filtering missing genotypes using bcftools, explain the syntax, and provide practical examples you can apply to your own data. What Are Missing Genotypes? In VCF files, missing genotypes are represented by ./. (for diploid) or . (for haploid). These occur when: - The sequencing depth at a position was too low to make a confident call - The variant caller couldn't determine the genotype - The sample was not sequenced at that position - Quality filters excluded the genotype Basic bcftools Filter Syntax The bcftools filter and bcftools view commands provide powerful options for handling missing genotypes. Here's the basic syntax: `bash Filter sites with missing genotypes bcftools view -i 'F_MISSING 30 && INFO/DP > 10' input.vcf.gz -o high_quality.vcf.gz ` This command filters for: - Less than 10% missing genotypes - Quality score above 30 - Depth greater than 10 Setting Missing Genotypes You can also set low-quality genotypes to missing before filtering: `bash Set genotypes with GQ Expression Description F_MISSING Fraction of samples with missing genotypes (0-1) N_MISSING Number of samples with missing genotypes F_PASS Fraction of samples that passed filters N_PASS Number of samples that passed filters Conclusion Properly handling missing genotypes is crucial for variant analysis quality. bcftools provides flexible and efficient tools for filtering based on missing data rates. Start with exploratory analysis to understand your data's missing patterns, then apply appropriate thresholds based on your specific analysis requirements. For more complex filtering needs, consider combining bcftools with other tools in your pipeline, and always validate your results by checking variant counts before and after filtering.
Background

Get Started Now

Ready to See
Tracer In Action?

Start for free or
Tracer Logo

Tracer is the first pipeline monitoring system purpose-built for high-compute workloads that lives in the OS.

2025 The Forge Software Inc. | A US Delaware Corporation, registered at 99 Wall Street, Suite 168 New York, NY 10005 | Terms & Conditions | Privacy Policy | Cookies Policy