IDEAL-GENOM Documentation
IDEAL-GENOM is a comprehensive Python package for automated, reproducible analysis of human genotype data. It provides end-to-end pipelines for genomic quality control (QC), post-imputation VCF processing, and genome-wide association studies (GWAS). The package wraps years of research expertise from CGE Tübingen, integrating PLINK 1.9/2.0, GCTA, and BCFtools with rich reporting and visualizations.
Version: 1.1.0
🎯 Key Features
- Comprehensive Pipelines
Genomic QC: Sample QC, Ancestry QC, and Variant QC for case-control studies
GWAS Analysis: Generalized Linear Models (GLM) and Mixed Models (GLMM)
VCF Processing: Post-imputation filtering, normalization, and conversion to PLINK
Population Structure: FST statistics, PCA, UMAP visualization, and ancestry projection
- Advanced Analytics
Sample Quality Control: Missingness, sex verification, heterozygosity, relatedness (kinship/IBD)
Ancestry Analysis: Population stratification detection with 1000 Genomes reference
Variant Filtering: Hardy-Weinberg equilibrium, MAF, genotype rate, differential missingness
GWAS Tools: Association testing, top-hits extraction, gene annotation (Ensembl/RefSeq)
Dimensionality Reduction: PCA and UMAP for population structure visualization
- Modern Design
YAML Configuration: Single configuration file with clear, hierarchical structure
Flexible Pipeline System: Enable/disable steps, customize parameters per analysis
Multiple Interfaces: Command-line tool, Python API, Jupyter notebooks
Docker Support: Pre-configured container with all genomic tools installed
Automated Workflows: Pipeline executor handles dependencies and data flow
Rich Reporting: Publication-ready plots and comprehensive QC metrics
- Modern Design
YAML Configuration: Single configuration file with clear, hierarchical structure
Flexible Pipeline System: Enable/disable steps, customize parameters per analysis
Multiple Interfaces: Command-line tool, Python API, Jupyter notebooks
Docker Support: Pre-configured container with all genomic tools installed
Automated Workflows: Pipeline executor handles dependencies and data flow
Rich Reporting: Publication-ready plots and comprehensive QC metrics
- Developer Friendly
Reproducible: All steps, parameters, and outputs logged
Extensible: Modular architecture for adding custom analysis steps
Well Documented: Comprehensive guides, API reference, and examples
Type Hints: Full type annotations for better IDE support
Quick Start
Installation
pip install ideal-genom
Basic Usage
# Generate a configuration template
ideal-genom template --output my_pipeline.yaml
# Edit the configuration file to match your data
nano my_pipeline.yaml
# Validate your configuration
ideal-genom validate --config my_pipeline.yaml
# Run the pipeline
ideal-genom run --config my_pipeline.yaml
Python API
from ideal_genom.core.config import load_config
from ideal_genom.core.pipeline import PipelineExecutor
# Load configuration
config = load_config("my_pipeline.yaml")
# Create and execute pipeline
executor = PipelineExecutor(config)
executor.execute()
Available Pipelines
- QC Pipeline - Quality control for case-control studies
Sample QC: Individual-level quality control
Ancestry QC: Population structure and outlier detection
Variant QC: SNP-level quality control
Population Visualization: UMAP/t-SNE plots
- GWAS Pipeline - Genome-wide association analysis
Preparatory: LD pruning and PCA decomposition
GLM Analysis: Fixed effects association testing
GLMM Analysis: Mixed model with genetic relationship matrix
Annotation: Gene mapping and functional annotation
- VCF Pipeline - Post-imputation processing
VCF Processing: Filter, normalize, annotate, concatenate
PLINK Conversion: Convert to PLINK binary format
Quality filtering: R² threshold, multiallelic handling
Documentation Contents
User Guide
- Installation Guide
- Getting Started
- Configuration Guide
- Overview
- Configuration File Structure
- Getting Started with Configuration
- Pipeline Section
- QC Pipeline Configuration
- Settings Section
- Advanced Configuration Patterns
- Parameter Tuning Guidelines
- Common Configuration Examples
- Troubleshooting Configuration
- See Also
- Steps Configuration
- Advanced Configuration
- Performance Tuning
- Best Practices
- Troubleshooting
- Examples
API Reference
Additional Resources
Supported Tools
IDEAL-GENOM integrates the following genomic analysis tools:
PLINK 1.9: Classic PLINK for QC and association analysis
PLINK 2.0: Modern version with improved performance (AVX2 optimized)
GCTA: Genetic relationship matrix and mixed model analysis
BCFtools: VCF manipulation and quality filtering
These tools are automatically used by the pipeline and must be installed separately or use the provided Docker image.
Citation
If you use IDEAL-GENOM in your research, please cite:
@software{ideal_genom_2026,
title = {IDEAL-GENOM: Comprehensive Genomic Analysis Pipeline},
author = {Giraldo González, Luis and Tenghe, Amabel},
year = {2026},
version = {0.2.0},
url = {https://github.com/cge-tubingens/ideal-genom-qc}
}
Getting Help
Documentation: https://ideal-genom-qc.readthedocs.io/
Issues: https://github.com/cge-tubingens/cge-comrare-pipeline/issues
Examples: See the Examples page for complete workflows
License
IDEAL-GENOM is released under the MIT License. See the LICENSE file in the repository for details.