diff --git a/README.md b/README.md index 1dd3733..a44680b 100644 --- a/README.md +++ b/README.md @@ -2,44 +2,72 @@ # Hostile -Rapid FASTQ decontamination by host depletion. Accepts paired fastq.gz files as arguments and outputs paired fastq.gz files. Downloads and caches a custom human reference genome to `$XDG_DATA_DIR`. Replaces read headers with incrementing integers for speed and privacy. Python package with CLI and Python API. Installs with conda/mamba. +Rapid FASTQ decontamination by host subtraction. Accepts Illumina or ONT fastq[.gz] input and outputs fastq.gz files. Downloads and caches a custom human T2T + HLA reference genome to `$XDG_DATA_DIR` when run for the first time. Replaces read headers with incrementing integers for speed and privacy. Python package with CLI and Python API. Installs with conda/mamba. Please read the [BioRxiv preprint](https://www.biorxiv.org/content/10.1101/2023.07.04.547735) for further information, and open a GitHub issue if you encounter problems. ## Install +### Conda + +```bash +curl -OJ https://raw.githubusercontent.com/bede/hostile/main/environment.yml +conda env create -f environment.yml # Mamba is faster +conda activate hostile +pip install hostile + +# Test +hostile clean --fastq1 tests/data/mixed_human_100_1.fastq.gz --fastq2 tests/data/mixed_human_100_2.fastq.gz +``` + + + +### Docker + +*Coming soon* + + + +### Development install + ```bash git clone https://github.com/bede/hostile.git cd hostile conda env create -f environment.yml # Use mamba if impatient conda activate hostile -pip install . +pip install --editable '.[dev]' +pytest ``` + ## Command line usage ```bash % hostile clean --help -usage: hostile clean [-h] --fastq1 FASTQ1 --fastq2 FASTQ2 [--aligner {bowtie2,minimap2}] [--out-dir OUT_DIR] [--threads THREADS] [--debug] +usage: hostile clean [-h] --fastq1 FASTQ1 [--fastq2 FASTQ2] [--aligner {bowtie2,minimap2}] [--custom-index CUSTOM_INDEX] [--out-dir OUT_DIR] + [--threads THREADS] [--debug] -Remove human reads from paired fastq.gz files +Remove human reads from paired fastq(.gz) files options: -h, --help show this help message and exit - --fastq1 FASTQ1 path to forward fastq.gz file - --fastq2 FASTQ2 path to reverse fastq.gz file + --fastq1 FASTQ1 path to forward fastq(.gz) file + --fastq2 FASTQ2 optional path to reverse fastq(.gz) file + (default: None) --aligner {bowtie2,minimap2} alignment algorithm (default: bowtie2) + --custom-index CUSTOM_INDEX + path to custom index + (default: None) --out-dir OUT_DIR output directory for decontaminated fastq.gz files - (default: /Users/bede/Research/Git/hostile) + (default: /root/hostile/tests/data) --threads THREADS number of CPU threads to use - (default: 10) + (default: 1) --debug show debug messages - (default: False) - + (default: False) (default: False) ``` @@ -64,7 +92,6 @@ Cleaning: 100%|█████████████████████ "reads_removed_proportion": 0.0 } ] - ``` @@ -72,20 +99,12 @@ Cleaning: 100%|█████████████████████ ## Python usage ```python +from pathlib import Path from hostile.lib import clean_paired_fastqs -decontamination_statistics = clean_paired_fastqs(fastqs=[("h37rv_10.r1.fastq.gz", fastq2="h37rv_10.r1.fastq.gz")]) -``` - - - -## Development +stats = clean_paired_fastqs( + fastqs=[(Path("h37rv_10.r1.fastq.gz"), Path("h37rv_10.r1.fastq.gz"))] +) -```bash -git clone https://github.com/bede/hostile.git -cd hostile -conda env create -f environment.yml # Use mamba if impatient -conda activate hostile -pip install --editable '.[dev]' -pytest +print(stats) ```