Core Modules

Core framework modules for pipeline execution, configuration, and CLI.

Pipeline Executor

Pipeline orchestration engine for ideal_genom.

class ideal_genom.core.pipeline.PipelineExecutor(config: Dict[str, Any], dry_run: bool = False)[source]

Bases: object

Orchestrates execution of sub-pipeline classes based on configuration.

config

Pipeline configuration dictionary

Type:: dict

steps

Dictionary of instantiated sub-pipeline objects

Type:: dict

base_output_dir

Base output directory for all pipeline steps

Type:: str

__init__(config: Dict[str, Any], dry_run: bool = False)[source]

Initialize pipeline executor.

Parameters:

config (dict) – Pipeline configuration dictionary (from config.load_config)
dry_run (bool) – If True, skip directory creation and actual execution

execute() → None[source]: Execute all pipeline steps sequentially.

get_step_output(step_name: str, attribute: str = 'output_path') → Any[source]

Get output from a completed step.

Parameters:

step_name (str) – Name of the step
attribute (str, optional) – Attribute to retrieve (default: ‘output_path’)

Returns:

Value of the requested attribute

Return type:

Any

Raises:

ValueError – If step not found or attribute doesn’t exist

get_pipeline_summary() → Dict[str, Any][source]

Get a summary of the pipeline configuration and status.

Returns:: Pipeline summary including enabled steps, dependencies, and configuration
Return type:: dict

Configuration

Configuration loading and validation for ideal_genom pipelines.

exception ideal_genom.core.config.ConfigurationError[source]

Bases: Exception

Raised when configuration is invalid.

ideal_genom.core.config.load_config(config_path: str) → Dict[str, Any][source]

Load pipeline configuration from YAML file.

Parameters:: config_path (str) – Path to YAML configuration file
Returns:: Parsed configuration dictionary
Return type:: dict
Raises:: ConfigurationError – If configuration is invalid or file not found

ideal_genom.core.config.validate_config(config: Dict[str, Any]) → None[source]

Validate pipeline configuration structure.

Parameters:: config (dict) – Configuration dictionary to validate
Raises:: ConfigurationError – If configuration structure is invalid

ideal_genom.core.config.validate_step(step: Dict[str, Any], index: int) → None[source]

Validate a single pipeline step configuration.

Parameters:

step (dict) – Step configuration dictionary
index (int) – Step index in pipeline (for error messages)

Raises:

ConfigurationError – If step configuration is invalid

Command Line Interface

Command-line interface for IDEAL-GENOM-QC pipeline.

This module provides the main CLI entry point for running genomic quality control pipelines using YAML configuration files.

ideal_genom.core.cli.setup_logging(level: str = 'INFO') → None[source]

Setup basic logging configuration.

Parameters:: level (str) – Logging level (DEBUG, INFO, WARNING, ERROR)

ideal_genom.core.cli.validate_config_file(config_path: str) → Path[source]

Validate that the configuration file exists and is readable.

Parameters:: config_path (str) – Path to the configuration file
Returns:: Validated configuration file path
Return type:: Path
Raises:: FileNotFoundError – If configuration file doesn’t exist

ideal_genom.core.cli.cmd_run(args: Namespace) → int[source]

Execute the run command.

Parameters:: args (argparse.Namespace) – Parsed command line arguments
Returns:: Exit code (0 for success, 1 for failure)
Return type:: int

ideal_genom.core.cli.cmd_validate(args: Namespace) → int[source]

Execute the validate command.

Parameters:: args (argparse.Namespace) – Parsed command line arguments
Returns:: Exit code (0 for success, 1 for failure)
Return type:: int

ideal_genom.core.cli.cmd_template(args: Namespace) → int[source]

Execute the template command.

Parameters:: args (argparse.Namespace) – Parsed command line arguments
Returns:: Exit code (0 for success, 1 for failure)
Return type:: int

ideal_genom.core.cli.create_parser() → ArgumentParser[source]

Create the command line argument parser.

Returns:: Configured argument parser
Return type:: argparse.ArgumentParser

ideal_genom.core.cli.main(argv: list | None = None) → int[source]

Main CLI entry point.

Parameters:: argv (list, optional) – Command line arguments (uses sys.argv if None)
Returns:: Exit code
Return type:: int

Executor

Command execution utilities for external genomic tools.

exception ideal_genom.core.executor.CommandExecutionError[source]

Bases: Exception

Raised when a shell command fails.

ideal_genom.core.executor.shell_do(command: str | List[str], cwd: str | None = None, log_file: str | None = None, capture_output: bool = False, check: bool = True) → CompletedProcess[source]

Execute a shell command for genomic analysis tools.

This is a wrapper around subprocess.run with logging and error handling tailored for genomic analysis pipelines (PLINK, GCTA, bcftools, etc.).

Parameters:

command (str or list of str) – Command to execute. Can be a string or list of arguments.
cwd (str, optional) – Working directory for command execution
log_file (str, optional) – Path to file where stdout/stderr should be logged
capture_output (bool, default=False) – If True, capture stdout and stderr in returned object
check (bool, default=True) – If True, raise CommandExecutionError on non-zero exit code

Returns:

Completed process with returncode, stdout, stderr

Return type:

subprocess.CompletedProcess

Raises:

CommandExecutionError – If command fails and check=True

Examples

>>> # Execute PLINK command
>>> shell_do("plink --bfile input --maf 0.01 --make-bed --out output")

>>> # Execute with working directory
>>> shell_do(
...     ["bcftools", "view", "-Oz", "input.vcf"],
...     cwd="/data/work",
...     log_file="/data/logs/bcftools.log"
... )

ideal_genom.core.executor.run_plink(args: List[str], log_file: str | None = None, cwd: str | None = None) → CompletedProcess[source]

Execute PLINK command.

Parameters:

args (list of str) – PLINK arguments (without ‘plink’ command itself)
log_file (str, optional) – Path to log file
cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

Examples

>>> run_plink([
...     '--bfile', 'input',
...     '--maf', '0.01',
...     '--make-bed',
...     '--out', 'output'
... ])

ideal_genom.core.executor.run_plink2(args: List[str], log_file: str | None = None, cwd: str | None = None) → CompletedProcess[source]

Execute PLINK2 command.

Parameters:

args (list of str) – PLINK2 arguments (without ‘plink2’ command itself)
log_file (str, optional) – Path to log file
cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

ideal_genom.core.executor.run_gcta(args: List[str], log_file: str | None = None, cwd: str | None = None) → CompletedProcess[source]

Execute GCTA command.

Parameters:

args (list of str) – GCTA arguments (without ‘gcta64’ command itself)
log_file (str, optional) – Path to log file
cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

ideal_genom.core.executor.run_bcftools(args: List[str], log_file: str | None = None, cwd: str | None = None) → CompletedProcess[source]

Execute bcftools command.

Parameters:

args (list of str) – bcftools arguments (without ‘bcftools’ command itself)
log_file (str, optional) – Path to log file
cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess