Contributing Guide
We welcome contributions to IDEAL-GENOM-QC! This guide will help you get started with contributing to the project, whether you’re fixing bugs, adding features, improving documentation, or helping with testing.
Getting Started
Development Setup
Fork and clone the repository:
# Fork on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/IDEAL-GENOM-QC.git
cd IDEAL-GENOM-QC
Set up development environment:
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Activate virtual environment
poetry shell
Install development dependencies:
# Install additional development tools
poetry install --with dev
# Install pre-commit hooks
pre-commit install
Verify installation:
# Run tests
pytest
# Check code style
black --check .
flake8 .
Project Structure
Understanding the codebase structure:
IDEAL-GENOM-QC/
├── ideal_genom_qc/ # Main package
│ ├── __init__.py
│ ├── SampleQC.py # Sample quality control
│ ├── AncestryQC.py # Ancestry analysis
│ ├── VariantQC.py # Variant quality control
│ ├── PopStructure.py # Population structure analysis
│ ├── UMAPplot.py # UMAP visualization
│ ├── Helpers.py # Utility functions
│ └── get_references.py # Reference data handling
├── tests/ # Test suite
├── docs/ # Documentation
├── notebooks/ # Example notebooks
├── data/ # Reference data
└── pyproject.toml # Project configuration
Types of Contributions
We welcome several types of contributions:
Bug Reports
Before submitting a bug report:
Check existing issues to avoid duplicates
Test with the latest version
Gather system information and error logs
Bug report template:
**Bug Description**
A clear description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Configuration used
2. Command executed
3. Error encountered
**Expected Behavior**
What you expected to happen.
**Environment**
- OS: [e.g., Ubuntu 20.04]
- Python version: [e.g., 3.9.7]
- IDEAL-GENOM-QC version: [e.g., 0.1.0]
- PLINK versions: [e.g., 1.9, 2.0]
**Additional Context**
- Configuration files
- Log files
- Sample data characteristics
Feature Requests
Feature request template:
**Feature Description**
A clear description of what you want to achieve.
**Use Case**
Why is this feature needed? What problem does it solve?
**Proposed Solution**
How would you like this implemented?
**Alternatives Considered**
What other solutions have you considered?
**Additional Context**
Any other context or screenshots about the feature request.
Code Contributions
Development workflow:
Create a feature branch:
git checkout -b feature/new-qc-method
# or
git checkout -b bugfix/fix-memory-leak
Make your changes:
Follow the existing code style
Add tests for new functionality
Update documentation as needed
Keep commits atomic and well-described
Test your changes:
# Run all tests
pytest
# Test specific modules
pytest tests/test_sample_qc.py
# Run with coverage
pytest --cov=ideal_genom_qc
Check code quality:
# Format code
black .
# Check style
flake8 .
# Type checking
mypy ideal_genom_qc/
Commit and push:
git add .
git commit -m "Add new QC method for contamination detection"
git push origin feature/new-qc-method
Create pull request:
Use the PR template
Reference any related issues
Include screenshots for UI changes
Wait for review and address feedback
Documentation Contributions
Types of documentation improvements:
API documentation improvements
Tutorial enhancements
Example additions
Typo fixes
Translation (future)
Documentation workflow:
# Install documentation dependencies
poetry install --with docs
# Build documentation locally
cd docs/
make html
# Open in browser
open build/html/index.html
Testing Contributions
Help improve test coverage:
# Check current coverage
pytest --cov=ideal_genom_qc --cov-report=html
open htmlcov/index.html
Types of tests needed:
Unit tests for individual functions
Integration tests for complete workflows
Performance tests for large datasets
Cross-platform compatibility tests
Code Style Guidelines
Python Style
We follow PEP 8 with some modifications:
Line length: 88 characters (Black default)
Imports: Use isort for import sorting
Docstrings: Use Google-style docstrings
Type hints: Use type hints for public APIs
Example function:
def calculate_kinship_matrix(
input_path: Path,
output_path: Path,
maf_threshold: float = 0.01,
missing_threshold: float = 0.1
) -> pd.DataFrame:
"""Calculate kinship matrix for sample relatedness analysis.
Args:
input_path: Path to input PLINK files
output_path: Path for output files
maf_threshold: Minor allele frequency threshold
missing_threshold: Maximum missing data rate
Returns:
DataFrame containing kinship coefficients
Raises:
FileNotFoundError: If input files don't exist
ValueError: If thresholds are out of valid range
"""
# Implementation here
pass
Documentation Style
RestructuredText: Use .rst format for documentation
Clear examples: Include working code examples
Cross-references: Link between related sections
Screenshots: Include for UI elements
Example documentation:
Sample Quality Control
======================
The :class:`SampleQC` class performs comprehensive quality control
on individual samples in your genomic dataset.
Basic Usage
-----------
.. code-block:: python
from ideal_genom_qc import SampleQC
qc = SampleQC(
input_path="data/input",
input_name="mydata",
output_path="data/output",
output_name="clean_data"
)
qc.run_sample_qc()
Git Workflow
Branch Naming
Use descriptive branch names:
feature/add-contamination-detection
bugfix/fix-memory-leak-in-pca
docs/improve-api-documentation
test/add-integration-tests
Commit Messages
Follow conventional commit format:
type(scope): description
[optional body]
[optional footer]
Examples:
feat(ancestry): add support for custom reference populations
fix(sample_qc): resolve memory leak in kinship calculation
docs(api): add examples to SampleQC class documentation
test(variant_qc): add unit tests for HWE calculation
Types: - feat: New feature - fix: Bug fix - docs: Documentation - test: Tests - refactor: Code refactoring - perf: Performance improvement - style: Code style changes
Pull Request Process
PR Template
Pull request template:
## Description
Brief description of what this PR does.
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring
## Testing
- [ ] Tests pass locally
- [ ] Added new tests for changes
- [ ] Tested on sample datasets
## Documentation
- [ ] Updated API documentation
- [ ] Updated user documentation
- [ ] Added/updated examples
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Commented hard-to-understand areas
- [ ] No merge conflicts
## Related Issues
Fixes #123
Related to #456
Review Process
What reviewers look for:
Correctness: Does the code do what it’s supposed to do?
Testing: Are there adequate tests?
Documentation: Is the code well-documented?
Style: Does it follow project conventions?
Performance: Will it negatively impact performance?
Compatibility: Will it break existing functionality?
Responding to feedback:
Address all comments
Ask for clarification if needed
Update tests and documentation
Force-push updates to your branch
Release Process
Versioning
We use semantic versioning (semver):
MAJOR: Incompatible API changes
MINOR: New functionality (backward compatible)
PATCH: Bug fixes (backward compatible)
Examples: - 0.1.0 → 0.1.1 (bug fix) - 0.1.1 → 0.2.0 (new feature) - 0.2.0 → 1.0.0 (major API change)
Changelog
We maintain a changelog following Keep a Changelog:
# Changelog
## [Unreleased]
### Added
- New contamination detection method
### Fixed
- Memory leak in PCA calculation
## [0.1.0] - 2025-01-15
### Added
- Initial release
- Sample QC functionality
- Ancestry analysis
- Variant QC
- UMAP visualization
Community Guidelines
Code of Conduct
We are committed to providing a welcoming and inclusive environment. Please:
Be respectful and constructive
Welcome newcomers and help them learn
Focus on what’s best for the community
Use inclusive language
Be patient with questions and mistakes
Communication
Preferred channels:
GitHub Issues: Bug reports, feature requests
GitHub Discussions: General questions, ideas
Pull Request comments: Code-specific discussions
Email: Security issues, private matters
Communication guidelines:
Be clear and concise
Provide context and examples
Use searchable, descriptive titles
Follow up on conversations
Tag relevant maintainers when needed
Recognition
Contributors will be recognized in:
Authors file: Major contributors
Release notes: Feature contributors
Documentation: Example providers
GitHub: All contributors via GitHub’s contributor graph
Types of recognition:
Code contributions
Documentation improvements
Bug reports and testing
Community support
Translations (future)
Getting Help
If you need help contributing:
Read existing issues and PRs for examples
Start with “good first issue” labels
Ask questions in GitHub discussions
Join our community calls (when available)
Reach out to maintainers directly
Resources:
Thank you for contributing to IDEAL-GENOM-QC! 🎉