-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding --seed
flag to customize the seed
when downsampling
#29
Changes from 5 commits
484189c
1c85bd4
0d2d842
bffb135
5c88ad8
e4a5e31
4f1c230
85e676c
b0fb2e5
fb2d765
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,17 @@ | ||
# YEAT | ||
|
||
YEAT, **Y**our **E**verday **A**ssembly **T**ool, is an update to [`asm_tools`](https://github.com/bioforensics/asm_tools). It uses a Snakemake workflow to preprocess, downsample, and assemble paired-end fastq files with SPAdes. | ||
YEAT, **Y**our **E**verday **A**ssembly **T**ool, is an update to [`asm_tools`](https://github.com/bioforensics/asm_tools). It uses a Snakemake workflow to preprocess, downsample, and assemble paired-end fastq files with various assemblers such as SPAdes, MEGAHIT, and Unicycler. | ||
|
||
<p align="center"> | ||
<img width="220" alt="Screen Shot 2022-02-02 at 10 57 31 AM" src="https://user-images.githubusercontent.com/33472323/152189781-2bfdc62b-f554-42d5-8f78-f94ab2b133eb.png"> | ||
</p> | ||
## Installation | ||
|
||
``` | ||
git clone https://github.com/bioforensics/yeat.git | ||
cd yeat | ||
conda env create --name yeat --file environment.yml | ||
conda activate yeat | ||
pip install . | ||
``` | ||
|
||
## Usage: | ||
|
||
```$ yeat {read1} {read2} --outdir {path} --sample {name}``` | ||
```$ yeat {config} {read1} {read2} --outdir {path} --sample {name}``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,13 +4,14 @@ channels: | |
- bioconda | ||
- defaults | ||
dependencies: | ||
- black=21.10b0 | ||
- black=22.10 | ||
- fastp>=0.23 | ||
- fastqc>=0.11 | ||
- gzip>=1.7 | ||
- mash>=2.3 | ||
- megahit>=1.2 | ||
- pytest-cov>=3.0 | ||
- python>=3.9 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. YEAT cannot install if the user's python version is < 3.9. Added this to allow users to upgrade if needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to add or update an entry in the change log describing why only Python >=3.9 is supported now. |
||
- quast>=5.0 | ||
- seqtk>=1.3 | ||
- snakemake>=6.10 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
import pandas as pd | ||
from pathlib import Path | ||
import pytest | ||
from random import randint | ||
from yeat import cli | ||
from yeat.cli import InitAction | ||
from yeat.tests import data_file | ||
|
@@ -94,7 +95,7 @@ def test_unicycler(capsys, tmp_path): | |
@pytest.mark.long | ||
@pytest.mark.parametrize( | ||
"downsample,num_contigs,largest_contig,total_len", | ||
[("2000", 71, 5120, 69189), ("-1", 56, 35168, 199940)], | ||
[("2000", 79, 5294, 70818), ("-1", 56, 35168, 199940)], | ||
) | ||
def test_custom_downsample_input( | ||
downsample, num_contigs, largest_contig, total_len, capsys, tmp_path | ||
|
@@ -108,6 +109,8 @@ def test_custom_downsample_input( | |
wd, | ||
"-d", | ||
downsample, | ||
"--seed", | ||
"0", | ||
] | ||
args = cli.get_parser().parse_args(arglist) | ||
cli.main(args) | ||
|
@@ -164,3 +167,25 @@ def test_custom_coverage_input(coverage, capsys, tmp_path): | |
assert df.iloc[12]["sample_contigs"] == 56 # num_contigs | ||
assert df.iloc[13]["sample_contigs"] == 35168 # largest_contig | ||
assert df.iloc[14]["sample_contigs"] == 199940 # total_len | ||
|
||
|
||
@pytest.mark.long | ||
@pytest.mark.parametrize("execution_number", range(3)) | ||
def test_random_downsample_seed(execution_number, capsys, tmp_path): | ||
wd = str(tmp_path) | ||
arglist = [ | ||
data_file("megahit.cfg"), | ||
data_file("short_reads_1.fastq.gz"), | ||
data_file("short_reads_2.fastq.gz"), | ||
"--outdir", | ||
wd, | ||
"-d", | ||
"2000", | ||
] | ||
args = cli.get_parser().parse_args(arglist) | ||
cli.main(args) | ||
quast_report = Path(wd).resolve() / "analysis" / "quast" / "megahit" / "report.tsv" | ||
df = pd.read_csv(quast_report, sep="\t") | ||
assert 61 <= df.iloc[12]["sample_contigs"] <= 91 # 76 +-20% of avg num_contigs | ||
assert 4183 <= df.iloc[13]["sample_contigs"] <= 6273 # 5228 +-20% of avg largest_contig | ||
assert 59515 <= df.iloc[14]["sample_contigs"] <= 89271 # 74393 +-20% of avg total_len | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is my take on this suggestion. I was pretty liberal on my +- buffer range to catch the randomness from The way I determined my medium for each assert was:
Above the function, there is a decorator. When this function is executed with pytest, the function is called 3 times. Since, the seed is random by default, we do not need to specify the seed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good! A couple comments. I have no idea what information I think the assert num_contigs == pytest.approx(76, abs=15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Black version
21.10b0
has package incompatibilities errors with newer versions ofclick
. If a user has click version>8.1
, Black will crash with:To fix this, users will need to downgrade click down to
8.0
.This problem has been fixed in Black
22.3
and up.psf/black#2964
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't much matter which version of Black is used, as long as it's used consistently. So you're welcome to upgrade and pin a newer version that doesn't have these issues. But that's often best left to a dedicated thread, since it can result in numerous trivial formatting changes that add a lot of noise and clutter to an existing PR.