Visualization Modules ===================== The ``ideal_genom.visualizations`` package provides functions for creating publication-ready plots for GWAS and genomic analysis. Module Overview --------------- .. contents:: Modules :local: :depth: 1 manhattan_type -------------- Generate Manhattan and Miami plots for genome-wide association studies (GWAS). **Module:** ``ideal_genom.visualizations.manhattan_type`` Features: ^^^^^^^^^ - Data processing and visualization of GWAS summary statistics - Annotation of SNPs with gene information from various sources - Highlighting and labeling of specific SNPs of interest - Support for both Manhattan (single study) and Miami (two studies) plots Key Functions: ^^^^^^^^^^^^^^ .. py:function:: compute_relative_pos(data, chr_col='CHR', pos_col='POS', p_col='p') Compute the relative position of probes/SNPs across chromosomes and add a -log10(p-value) column. :param data: Input DataFrame containing genomic data :type data: pandas.DataFrame :param chr_col: Column name for chromosome identifiers :type chr_col: str :param pos_col: Column name for base pair positions :type pos_col: str :param p_col: Column name for p-values :type p_col: str :return: DataFrame with added columns for relative positions and -log10(p-values) :rtype: pandas.DataFrame .. py:function:: manhattan(df_gwas, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps=None, alpha=0.7, save_name='manhattan.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 6), dpi=500) Generate a Manhattan plot from GWAS summary statistics. :param df_gwas: DataFrame containing GWAS results :type df_gwas: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param pval_col: Column name for p-values :type pval_col: str :param chr_col: Column name for chromosome :type chr_col: str :param pos_col: Column name for base pair position :type pos_col: str :param snp_col: Column name for SNP identifiers :type snp_col: str :param p_threshold: Genome-wide significance threshold :type p_threshold: float :param annotate: List of SNP IDs to annotate with gene names :type annotate: Optional[list] :param annotation_type: Source for gene annotation ('ensembl', 'refseq', or 'both') :type annotation_type: str :param genome_build: Genome build version ('37' or '38') :type genome_build: str :param api_request: Whether to use API for annotation (if False, uses local GTF) :type api_request: bool :param highlight_snps: List of SNP IDs to highlight in different color :type highlight_snps: Optional[list] :param alpha: Transparency level for points :type alpha: float :param save_name: Filename for saving the plot :type save_name: str :param colors: Custom colors for alternating chromosomes :type colors: Optional[list] :param chr_text_shift: Shift amount for chromosome labels :type chr_text_shift: Optional[float] :param fig_size: Figure size (width, height) in inches :type fig_size: tuple :param dpi: Resolution for saved figure :type dpi: int :return: True if successful :rtype: bool .. py:function:: miami(df_gwas1, df_gwas2, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate1=None, annotate2=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps1=None, highlight_snps2=None, alpha=0.7, save_name='miami.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 12), dpi=500, plot1_label='Study 1', plot2_label='Study 2') Generate a Miami plot (back-to-back Manhattan plots) comparing two GWAS studies. :param df_gwas1: DataFrame containing GWAS results for first study :type df_gwas1: pandas.DataFrame :param df_gwas2: DataFrame containing GWAS results for second study :type df_gwas2: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param plot1_label: Label for the top plot :type plot1_label: str :param plot2_label: Label for the bottom plot :type plot2_label: str :return: True if successful :rtype: bool *Other parameters are the same as manhattan() function* Usage Example: ^^^^^^^^^^^^^^ .. code-block:: python import pandas as pd from ideal_genom.visualizations.manhattan_type import manhattan, miami # Load GWAS summary statistics gwas_df = pd.read_csv("gwas_results.txt", sep="\t") # Generate Manhattan plot manhattan( df_gwas=gwas_df, plots_dir="./plots", pval_col='P', chr_col='CHR', pos_col='BP', snp_col='SNP', p_threshold=5e-8, annotate=['rs12345', 'rs67890'], # Annotate specific SNPs annotation_type='ensembl', genome_build='38', save_name='my_manhattan.jpeg' ) # Generate Miami plot comparing two studies gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t") miami( df_gwas1=gwas_df, df_gwas2=gwas_df2, plots_dir="./plots", plot1_label='Discovery cohort', plot2_label='Replication cohort', save_name='my_miami.jpeg' ) plots ----- Functions for generating various plots for GWAS data analysis. **Module:** ``ideal_genom.visualizations.plots`` Features: ^^^^^^^^^ - QQ plots for visualizing the distribution of p-values - Beta-beta scatter plots for comparing effect sizes between studies - Trumpet plots for visualizing power and effect sizes - Support for both binary and quantitative traits Key Functions: ^^^^^^^^^^^^^^ .. py:function:: qqplot_draw(df_gwas, plots_dir, lambda_val=None, pval_col='P', conf_color='lightgray', save_name='qq_plot.jpeg', fig_size=(10, 10), dpi=500) Create a Q-Q (Quantile-Quantile) plot from GWAS results. This function generates a Q-Q plot comparing observed vs expected -log10(p-values) from GWAS results, including confidence intervals and genomic inflation factor (λ). :param df_gwas: DataFrame containing GWAS results with p-values :type df_gwas: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param lambda_val: Genomic inflation factor (calculated if None) :type lambda_val: Optional[float] :param pval_col: Column name for p-values :type pval_col: str :param conf_color: Color for confidence interval bands :type conf_color: str :param save_name: Filename for saving the plot :type save_name: str :param fig_size: Figure size (width, height) in inches :type fig_size: tuple :param dpi: Resolution for saved figure :type dpi: int :return: True if successful :rtype: bool .. py:function:: beta_beta_plot(df1, df2, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, save_name='beta_beta.jpeg', fig_size=(10, 10), dpi=500, x_label='Study 1', y_label='Study 2') Create a beta-beta scatter plot comparing effect sizes between two GWAS studies. :param df1: DataFrame containing GWAS results for first study :type df1: pandas.DataFrame :param df2: DataFrame containing GWAS results for second study :type df2: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param beta_col: Column name for effect sizes (beta) :type beta_col: str :param se_col: Column name for standard errors :type se_col: str :param snp_col: Column name for SNP identifiers :type snp_col: str :param pval_col: Column name for p-values :type pval_col: str :param p_threshold: Significance threshold for highlighting SNPs :type p_threshold: float :param annotate: List of SNP IDs to annotate :type annotate: Optional[list] :param annotation_type: Source for gene annotation :type annotation_type: str :param genome_build: Genome build version :type genome_build: str :param api_request: Whether to use API for annotation :type api_request: bool :param save_name: Filename for saving the plot :type save_name: str :param fig_size: Figure size (width, height) in inches :type fig_size: tuple :param dpi: Resolution for saved figure :type dpi: int :param x_label: Label for x-axis :type x_label: str :param y_label: Label for y-axis :type y_label: str :return: True if successful :rtype: bool .. py:function:: trumpet_plot_binary(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_cases=None, n_controls=None, prevalence=0.5, alpha_val=0.05, save_name='trumpet_binary.jpeg', fig_size=(10, 10), dpi=500) Create a trumpet plot for binary traits, showing power curves and effect sizes. :param df_gwas: DataFrame containing GWAS results :type df_gwas: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param beta_col: Column name for effect sizes :type beta_col: str :param se_col: Column name for standard errors :type se_col: str :param snp_col: Column name for SNP identifiers :type snp_col: str :param pval_col: Column name for p-values :type pval_col: str :param p_threshold: Significance threshold :type p_threshold: float :param annotate: List of SNP IDs to annotate :type annotate: Optional[list] :param maf_col: Column name for minor allele frequency :type maf_col: str :param n_cases: Number of cases in the study :type n_cases: Optional[int] :param n_controls: Number of controls in the study :type n_controls: Optional[int] :param prevalence: Disease prevalence :type prevalence: float :param alpha_val: Significance level for power calculation :type alpha_val: float :param save_name: Filename for saving the plot :type save_name: str :return: True if successful :rtype: bool .. py:function:: trumpet_plot_quantitative(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_samples=None, alpha_val=0.05, save_name='trumpet_quantitative.jpeg', fig_size=(10, 10), dpi=500) Create a trumpet plot for quantitative traits, showing power curves and effect sizes. :param df_gwas: DataFrame containing GWAS results :type df_gwas: pandas.DataFrame :param plots_dir: Directory path where the plot will be saved :type plots_dir: str :param n_samples: Total number of samples in the study :type n_samples: Optional[int] :return: True if successful :rtype: bool *Other parameters are the same as trumpet_plot_binary() function* Usage Example: ^^^^^^^^^^^^^^ .. code-block:: python import pandas as pd from ideal_genom.visualizations.plots import ( qqplot_draw, beta_beta_plot, trumpet_plot_binary, trumpet_plot_quantitative ) # Load GWAS results gwas_df = pd.read_csv("gwas_results.txt", sep="\t") # Generate QQ plot qqplot_draw( df_gwas=gwas_df, plots_dir="./plots", pval_col='P', save_name='my_qq_plot.jpeg' ) # Beta-beta plot comparing two studies gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t") beta_beta_plot( df1=gwas_df, df2=gwas_df2, plots_dir="./plots", beta_col='BETA', se_col='SE', x_label='Discovery', y_label='Replication', save_name='my_beta_beta.jpeg' ) # Trumpet plot for binary trait trumpet_plot_binary( df_gwas=gwas_df, plots_dir="./plots", n_cases=1000, n_controls=1000, prevalence=0.1, save_name='my_trumpet_binary.jpeg' ) # Trumpet plot for quantitative trait trumpet_plot_quantitative( df_gwas=gwas_df, plots_dir="./plots", n_samples=2000, save_name='my_trumpet_quant.jpeg' ) zoom_heatmap ------------ Create zoomed heatmap visualizations of SNP associations, gene annotations, and linkage disequilibrium (LD) patterns. **Module:** ``ideal_genom.visualizations.zoom_heatmap`` Features: ^^^^^^^^^ - Filter and annotate SNP data in a genomic region - Calculate LD matrices using PLINK - Generate three-panel plots with: 1. Association plot with SNPs colored by functional consequences 2. Gene track showing gene locations and orientations 3. LD heatmap showing correlation patterns between SNPs Key Functions: ^^^^^^^^^^^^^^ .. py:function:: filter_sumstats(data_df, lead_snp, snp_col, p_col, pos_col, chr_col, pval_threshold=5e-8, radius=10e6) Filter GWAS summary statistics based on a lead SNP, p-value threshold and genomic region. :param data_df: DataFrame containing GWAS summary statistics :type data_df: pandas.DataFrame :param lead_snp: Lead SNP identifier to center the region around :type lead_snp: str :param snp_col: Column name for SNP identifiers :type snp_col: str :param p_col: Column name for p-values :type p_col: str :param pos_col: Column name for base pair positions :type pos_col: str :param chr_col: Column name for chromosome :type chr_col: str :param pval_threshold: P-value threshold for filtering :type pval_threshold: float :param radius: Genomic radius around lead SNP (in base pairs) :type radius: Union[float, int] :return: Filtered DataFrame :rtype: pandas.DataFrame .. py:function:: compute_ld(plink_file, snp_list, output_dir, lead_snp=None, ld_window_kb=10000, ld_window_snps=10000, threads=1) Compute linkage disequilibrium matrix for a list of SNPs using PLINK. :param plink_file: Path to PLINK binary file prefix (without .bed/.bim/.fam) :type plink_file: Union[str, Path] :param snp_list: List of SNP IDs for LD calculation :type snp_list: list :param output_dir: Directory to save output files :type output_dir: Union[str, Path] :param lead_snp: Lead SNP for coloring (optional) :type lead_snp: Optional[str] :param ld_window_kb: LD window size in kilobases :type ld_window_kb: int :param ld_window_snps: LD window size in number of SNPs :type ld_window_snps: int :param threads: Number of threads for PLINK :type threads: int :return: LD matrix as DataFrame :rtype: pandas.DataFrame .. py:function:: create_zoom_heatmap(sumstats_df, plink_file, lead_snp, output_dir, snp_col='SNP', chr_col='CHR', pos_col='BP', p_col='P', beta_col='BETA', pval_threshold=5e-8, radius=500000, ld_window_kb=1000, genome_build='38', annotation_type='ensembl', api_request=True, fig_size=(14, 12), dpi=300, threads=1, save_name='zoom_heatmap.png') Create a comprehensive zoom heatmap plot with association, gene track, and LD panels. :param sumstats_df: DataFrame containing GWAS summary statistics :type sumstats_df: pandas.DataFrame :param plink_file: Path to PLINK binary file prefix :type plink_file: Union[str, Path] :param lead_snp: Lead SNP identifier to center the plot :type lead_snp: str :param output_dir: Directory to save output files :type output_dir: Union[str, Path] :param snp_col: Column name for SNP identifiers :type snp_col: str :param chr_col: Column name for chromosome :type chr_col: str :param pos_col: Column name for base pair position :type pos_col: str :param p_col: Column name for p-values :type p_col: str :param beta_col: Column name for effect sizes :type beta_col: str :param pval_threshold: P-value threshold for filtering :type pval_threshold: float :param radius: Genomic radius around lead SNP (in base pairs) :type radius: Union[float, int] :param ld_window_kb: LD window size in kilobases :type ld_window_kb: int :param genome_build: Genome build version ('37' or '38') :type genome_build: str :param annotation_type: Source for gene annotation ('ensembl', 'refseq', or 'both') :type annotation_type: str :param api_request: Whether to use API for functional annotation :type api_request: bool :param fig_size: Figure size (width, height) in inches :type fig_size: tuple :param dpi: Resolution for saved figure :type dpi: int :param threads: Number of threads for PLINK :type threads: int :param save_name: Filename for saving the plot :type save_name: str :return: Path to saved figure :rtype: Path Usage Example: ^^^^^^^^^^^^^^ .. code-block:: python import pandas as pd from pathlib import Path from ideal_genom.visualizations.zoom_heatmap import create_zoom_heatmap # Load GWAS summary statistics sumstats = pd.read_csv("gwas_results.txt", sep="\t") # Create zoom heatmap around a lead SNP create_zoom_heatmap( sumstats_df=sumstats, plink_file=Path("data/genotypes"), # Without .bed/.bim/.fam extension lead_snp='rs12345', output_dir=Path("./plots"), snp_col='SNP', chr_col='CHR', pos_col='BP', p_col='P', beta_col='BETA', pval_threshold=5e-8, radius=500000, # 500kb window genome_build='38', annotation_type='ensembl', api_request=True, save_name='rs12345_zoom.png' ) Notes ----- **Dependencies:** - matplotlib - seaborn - pandas - numpy - textalloc (for label positioning) - pyensembl (for gene annotations) - PLINK 1.9 or 2.0 (for LD calculations) **Annotation Sources:** All plotting functions support gene annotation from: - **Ensembl**: Via REST API or local GTF files - **RefSeq**: Via local GTF files - **Both**: Combined annotations from both sources **Genome Builds:** Supported genome builds are GRCh37/hg19 ('37') and GRCh38/hg38 ('38') **Output Formats:** - JPEG format for Manhattan, Miami, QQ, beta-beta, and trumpet plots - PNG format for zoom heatmaps (recommended for better quality with complex graphics) - All plots are publication-ready with customizable DPI See Also -------- - :doc:`gwas_modules` - GWAS analysis modules that generate data for visualization - :doc:`Helpers` - Annotation utilities used by visualization functions - :doc:`api_overview` - Complete API reference