Contributing Guide
==================
We welcome contributions to IDEAL-GENOM-QC! This guide will help you get started with contributing to the project, whether you're fixing bugs, adding features, improving documentation, or helping with testing.
Getting Started
---------------
Development Setup
^^^^^^^^^^^^^^^^^
1. **Fork and clone the repository:**
.. code-block:: bash
# Fork on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/IDEAL-GENOM-QC.git
cd IDEAL-GENOM-QC
2. **Set up development environment:**
.. code-block:: bash
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Activate virtual environment
poetry shell
3. **Install development dependencies:**
.. code-block:: bash
# Install additional development tools
poetry install --with dev
# Install pre-commit hooks
pre-commit install
4. **Verify installation:**
.. code-block:: bash
# Run tests
pytest
# Check code style
black --check .
flake8 .
Project Structure
^^^^^^^^^^^^^^^^^
Understanding the codebase structure:
.. code-block:: text
IDEAL-GENOM-QC/
├── ideal_genom_qc/ # Main package
│ ├── __init__.py
│ ├── SampleQC.py # Sample quality control
│ ├── AncestryQC.py # Ancestry analysis
│ ├── VariantQC.py # Variant quality control
│ ├── PopStructure.py # Population structure analysis
│ ├── UMAPplot.py # UMAP visualization
│ ├── Helpers.py # Utility functions
│ └── get_references.py # Reference data handling
├── tests/ # Test suite
├── docs/ # Documentation
├── notebooks/ # Example notebooks
├── data/ # Reference data
└── pyproject.toml # Project configuration
Types of Contributions
----------------------
We welcome several types of contributions:
Bug Reports
^^^^^^^^^^^
**Before submitting a bug report:**
- Check existing issues to avoid duplicates
- Test with the latest version
- Gather system information and error logs
**Bug report template:**
.. code-block:: text
**Bug Description**
A clear description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Configuration used
2. Command executed
3. Error encountered
**Expected Behavior**
What you expected to happen.
**Environment**
- OS: [e.g., Ubuntu 20.04]
- Python version: [e.g., 3.9.7]
- IDEAL-GENOM-QC version: [e.g., 0.1.0]
- PLINK versions: [e.g., 1.9, 2.0]
**Additional Context**
- Configuration files
- Log files
- Sample data characteristics
Feature Requests
^^^^^^^^^^^^^^^^
**Feature request template:**
.. code-block:: text
**Feature Description**
A clear description of what you want to achieve.
**Use Case**
Why is this feature needed? What problem does it solve?
**Proposed Solution**
How would you like this implemented?
**Alternatives Considered**
What other solutions have you considered?
**Additional Context**
Any other context or screenshots about the feature request.
Code Contributions
^^^^^^^^^^^^^^^^^^
**Development workflow:**
1. **Create a feature branch:**
.. code-block:: bash
git checkout -b feature/new-qc-method
# or
git checkout -b bugfix/fix-memory-leak
2. **Make your changes:**
- Follow the existing code style
- Add tests for new functionality
- Update documentation as needed
- Keep commits atomic and well-described
3. **Test your changes:**
.. code-block:: bash
# Run all tests
pytest
# Test specific modules
pytest tests/test_sample_qc.py
# Run with coverage
pytest --cov=ideal_genom_qc
4. **Check code quality:**
.. code-block:: bash
# Format code
black .
# Check style
flake8 .
# Type checking
mypy ideal_genom_qc/
5. **Commit and push:**
.. code-block:: bash
git add .
git commit -m "Add new QC method for contamination detection"
git push origin feature/new-qc-method
6. **Create pull request:**
- Use the PR template
- Reference any related issues
- Include screenshots for UI changes
- Wait for review and address feedback
Documentation Contributions
^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Types of documentation improvements:**
- API documentation improvements
- Tutorial enhancements
- Example additions
- Typo fixes
- Translation (future)
**Documentation workflow:**
.. code-block:: bash
# Install documentation dependencies
poetry install --with docs
# Build documentation locally
cd docs/
make html
# Open in browser
open build/html/index.html
Testing Contributions
^^^^^^^^^^^^^^^^^^^^^
**Help improve test coverage:**
.. code-block:: bash
# Check current coverage
pytest --cov=ideal_genom_qc --cov-report=html
open htmlcov/index.html
**Types of tests needed:**
- Unit tests for individual functions
- Integration tests for complete workflows
- Performance tests for large datasets
- Cross-platform compatibility tests
Code Style Guidelines
---------------------
Python Style
^^^^^^^^^^^^
We follow PEP 8 with some modifications:
- **Line length:** 88 characters (Black default)
- **Imports:** Use `isort` for import sorting
- **Docstrings:** Use Google-style docstrings
- **Type hints:** Use type hints for public APIs
**Example function:**
.. code-block:: python
def calculate_kinship_matrix(
input_path: Path,
output_path: Path,
maf_threshold: float = 0.01,
missing_threshold: float = 0.1
) -> pd.DataFrame:
"""Calculate kinship matrix for sample relatedness analysis.
Args:
input_path: Path to input PLINK files
output_path: Path for output files
maf_threshold: Minor allele frequency threshold
missing_threshold: Maximum missing data rate
Returns:
DataFrame containing kinship coefficients
Raises:
FileNotFoundError: If input files don't exist
ValueError: If thresholds are out of valid range
"""
# Implementation here
pass
Documentation Style
^^^^^^^^^^^^^^^^^^^
- **RestructuredText:** Use .rst format for documentation
- **Clear examples:** Include working code examples
- **Cross-references:** Link between related sections
- **Screenshots:** Include for UI elements
**Example documentation:**
.. code-block:: rst
Sample Quality Control
======================
The :class:`SampleQC` class performs comprehensive quality control
on individual samples in your genomic dataset.
Basic Usage
-----------
.. code-block:: python
from ideal_genom_qc import SampleQC
qc = SampleQC(
input_path="data/input",
input_name="mydata",
output_path="data/output",
output_name="clean_data"
)
qc.run_sample_qc()
Git Workflow
------------
Branch Naming
^^^^^^^^^^^^^
Use descriptive branch names:
- `feature/add-contamination-detection`
- `bugfix/fix-memory-leak-in-pca`
- `docs/improve-api-documentation`
- `test/add-integration-tests`
Commit Messages
^^^^^^^^^^^^^^^
Follow conventional commit format:
.. code-block:: text
type(scope): description
[optional body]
[optional footer]
**Examples:**
.. code-block:: text
feat(ancestry): add support for custom reference populations
fix(sample_qc): resolve memory leak in kinship calculation
docs(api): add examples to SampleQC class documentation
test(variant_qc): add unit tests for HWE calculation
**Types:**
- `feat`: New feature
- `fix`: Bug fix
- `docs`: Documentation
- `test`: Tests
- `refactor`: Code refactoring
- `perf`: Performance improvement
- `style`: Code style changes
Pull Request Process
--------------------
PR Template
^^^^^^^^^^^
**Pull request template:**
.. code-block:: text
## Description
Brief description of what this PR does.
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring
## Testing
- [ ] Tests pass locally
- [ ] Added new tests for changes
- [ ] Tested on sample datasets
## Documentation
- [ ] Updated API documentation
- [ ] Updated user documentation
- [ ] Added/updated examples
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Commented hard-to-understand areas
- [ ] No merge conflicts
## Related Issues
Fixes #123
Related to #456
Review Process
^^^^^^^^^^^^^^
**What reviewers look for:**
1. **Correctness:** Does the code do what it's supposed to do?
2. **Testing:** Are there adequate tests?
3. **Documentation:** Is the code well-documented?
4. **Style:** Does it follow project conventions?
5. **Performance:** Will it negatively impact performance?
6. **Compatibility:** Will it break existing functionality?
**Responding to feedback:**
- Address all comments
- Ask for clarification if needed
- Update tests and documentation
- Force-push updates to your branch
Release Process
---------------
Versioning
^^^^^^^^^^
We use semantic versioning (semver):
- **MAJOR:** Incompatible API changes
- **MINOR:** New functionality (backward compatible)
- **PATCH:** Bug fixes (backward compatible)
**Examples:**
- `0.1.0` → `0.1.1` (bug fix)
- `0.1.1` → `0.2.0` (new feature)
- `0.2.0` → `1.0.0` (major API change)
Changelog
^^^^^^^^^
We maintain a changelog following `Keep a Changelog `_:
.. code-block:: text
# Changelog
## [Unreleased]
### Added
- New contamination detection method
### Fixed
- Memory leak in PCA calculation
## [0.1.0] - 2025-01-15
### Added
- Initial release
- Sample QC functionality
- Ancestry analysis
- Variant QC
- UMAP visualization
Community Guidelines
--------------------
Code of Conduct
^^^^^^^^^^^^^^^
We are committed to providing a welcoming and inclusive environment. Please:
- Be respectful and constructive
- Welcome newcomers and help them learn
- Focus on what's best for the community
- Use inclusive language
- Be patient with questions and mistakes
Communication
^^^^^^^^^^^^^
**Preferred channels:**
- **GitHub Issues:** Bug reports, feature requests
- **GitHub Discussions:** General questions, ideas
- **Pull Request comments:** Code-specific discussions
- **Email:** Security issues, private matters
**Communication guidelines:**
- Be clear and concise
- Provide context and examples
- Use searchable, descriptive titles
- Follow up on conversations
- Tag relevant maintainers when needed
Recognition
-----------
Contributors will be recognized in:
- **Authors file:** Major contributors
- **Release notes:** Feature contributors
- **Documentation:** Example providers
- **GitHub:** All contributors via GitHub's contributor graph
**Types of recognition:**
- Code contributions
- Documentation improvements
- Bug reports and testing
- Community support
- Translations (future)
Getting Help
------------
**If you need help contributing:**
- Read existing issues and PRs for examples
- Start with "good first issue" labels
- Ask questions in GitHub discussions
- Join our community calls (when available)
- Reach out to maintainers directly
**Resources:**
- `GitHub Flow `_
- `Poetry documentation `_
- `pytest documentation `_
- `Sphinx documentation `_
Thank you for contributing to IDEAL-GENOM-QC! 🎉