Core Modules

Core framework modules for pipeline execution, configuration, and CLI.

Pipeline Executor

Pipeline orchestration engine for ideal_genom.

class ideal_genom.core.pipeline.PipelineExecutor(config: Dict[str, Any], dry_run: bool = False)[source]

Bases: object

Orchestrates execution of sub-pipeline classes based on configuration.

config

Pipeline configuration dictionary

Type:

dict

steps

Dictionary of instantiated sub-pipeline objects

Type:

dict

base_output_dir

Base output directory for all pipeline steps

Type:

str

__init__(config: Dict[str, Any], dry_run: bool = False)[source]

Initialize pipeline executor.

Parameters:
  • config (dict) – Pipeline configuration dictionary (from config.load_config)

  • dry_run (bool) – If True, skip directory creation and actual execution

execute() None[source]

Execute all pipeline steps sequentially.

get_step_output(step_name: str, attribute: str = 'output_path') Any[source]

Get output from a completed step.

Parameters:
  • step_name (str) – Name of the step

  • attribute (str, optional) – Attribute to retrieve (default: ‘output_path’)

Returns:

Value of the requested attribute

Return type:

Any

Raises:

ValueError – If step not found or attribute doesn’t exist

get_pipeline_summary() Dict[str, Any][source]

Get a summary of the pipeline configuration and status.

Returns:

Pipeline summary including enabled steps, dependencies, and configuration

Return type:

dict

Configuration

Configuration loading and validation for ideal_genom pipelines.

exception ideal_genom.core.config.ConfigurationError[source]

Bases: Exception

Raised when configuration is invalid.

ideal_genom.core.config.load_config(config_path: str) Dict[str, Any][source]

Load pipeline configuration from YAML file.

Parameters:

config_path (str) – Path to YAML configuration file

Returns:

Parsed configuration dictionary

Return type:

dict

Raises:

ConfigurationError – If configuration is invalid or file not found

ideal_genom.core.config.validate_config(config: Dict[str, Any]) None[source]

Validate pipeline configuration structure.

Parameters:

config (dict) – Configuration dictionary to validate

Raises:

ConfigurationError – If configuration structure is invalid

ideal_genom.core.config.validate_step(step: Dict[str, Any], index: int) None[source]

Validate a single pipeline step configuration.

Parameters:
  • step (dict) – Step configuration dictionary

  • index (int) – Step index in pipeline (for error messages)

Raises:

ConfigurationError – If step configuration is invalid

Command Line Interface

Command-line interface for IDEAL-GENOM-QC pipeline.

This module provides the main CLI entry point for running genomic quality control pipelines using YAML configuration files.

ideal_genom.core.cli.setup_logging(level: str = 'INFO') None[source]

Setup basic logging configuration.

Parameters:

level (str) – Logging level (DEBUG, INFO, WARNING, ERROR)

ideal_genom.core.cli.validate_config_file(config_path: str) Path[source]

Validate that the configuration file exists and is readable.

Parameters:

config_path (str) – Path to the configuration file

Returns:

Validated configuration file path

Return type:

Path

Raises:

FileNotFoundError – If configuration file doesn’t exist

ideal_genom.core.cli.cmd_run(args: Namespace) int[source]

Execute the run command.

Parameters:

args (argparse.Namespace) – Parsed command line arguments

Returns:

Exit code (0 for success, 1 for failure)

Return type:

int

ideal_genom.core.cli.cmd_validate(args: Namespace) int[source]

Execute the validate command.

Parameters:

args (argparse.Namespace) – Parsed command line arguments

Returns:

Exit code (0 for success, 1 for failure)

Return type:

int

ideal_genom.core.cli.cmd_template(args: Namespace) int[source]

Execute the template command.

Parameters:

args (argparse.Namespace) – Parsed command line arguments

Returns:

Exit code (0 for success, 1 for failure)

Return type:

int

ideal_genom.core.cli.create_parser() ArgumentParser[source]

Create the command line argument parser.

Returns:

Configured argument parser

Return type:

argparse.ArgumentParser

ideal_genom.core.cli.main(argv: list | None = None) int[source]

Main CLI entry point.

Parameters:

argv (list, optional) – Command line arguments (uses sys.argv if None)

Returns:

Exit code

Return type:

int

Executor

Command execution utilities for external genomic tools.

exception ideal_genom.core.executor.CommandExecutionError[source]

Bases: Exception

Raised when a shell command fails.

ideal_genom.core.executor.shell_do(command: str | List[str], cwd: str | None = None, log_file: str | None = None, capture_output: bool = False, check: bool = True) CompletedProcess[source]

Execute a shell command for genomic analysis tools.

This is a wrapper around subprocess.run with logging and error handling tailored for genomic analysis pipelines (PLINK, GCTA, bcftools, etc.).

Parameters:
  • command (str or list of str) – Command to execute. Can be a string or list of arguments.

  • cwd (str, optional) – Working directory for command execution

  • log_file (str, optional) – Path to file where stdout/stderr should be logged

  • capture_output (bool, default=False) – If True, capture stdout and stderr in returned object

  • check (bool, default=True) – If True, raise CommandExecutionError on non-zero exit code

Returns:

Completed process with returncode, stdout, stderr

Return type:

subprocess.CompletedProcess

Raises:

CommandExecutionError – If command fails and check=True

Examples

>>> # Execute PLINK command
>>> shell_do("plink --bfile input --maf 0.01 --make-bed --out output")
>>> # Execute with working directory
>>> shell_do(
...     ["bcftools", "view", "-Oz", "input.vcf"],
...     cwd="/data/work",
...     log_file="/data/logs/bcftools.log"
... )

Execute PLINK command.

Parameters:
  • args (list of str) – PLINK arguments (without ‘plink’ command itself)

  • log_file (str, optional) – Path to log file

  • cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

Examples

>>> run_plink([
...     '--bfile', 'input',
...     '--maf', '0.01',
...     '--make-bed',
...     '--out', 'output'
... ])
ideal_genom.core.executor.run_plink2(args: List[str], log_file: str | None = None, cwd: str | None = None) CompletedProcess[source]

Execute PLINK2 command.

Parameters:
  • args (list of str) – PLINK2 arguments (without ‘plink2’ command itself)

  • log_file (str, optional) – Path to log file

  • cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

ideal_genom.core.executor.run_gcta(args: List[str], log_file: str | None = None, cwd: str | None = None) CompletedProcess[source]

Execute GCTA command.

Parameters:
  • args (list of str) – GCTA arguments (without ‘gcta64’ command itself)

  • log_file (str, optional) – Path to log file

  • cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess

ideal_genom.core.executor.run_bcftools(args: List[str], log_file: str | None = None, cwd: str | None = None) CompletedProcess[source]

Execute bcftools command.

Parameters:
  • args (list of str) – bcftools arguments (without ‘bcftools’ command itself)

  • log_file (str, optional) – Path to log file

  • cwd (str, optional) – Working directory

Returns:

Completed process

Return type:

subprocess.CompletedProcess