-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastq from unmapped reads #189
Conversation
But the current code already does this? Lines 846 to 898 in bc55df3
It even extracts the unmapped data to either BAM, FastQ depending on the users choice. I think we need something else :-( |
@maxibor are these fastq reads pre Trimming and merging, but without human reads? |
in this PR, yes |
But this step works on post AR fastq files |
Sorry - that close was courtesy of Maia |
I think you are good to go for testing once python version fixed. Minor thing: the help message/description is unspecific of what actually is being output. Thus maybe Alex' confusion |
Updated Conda env so Pysam should now come. But the test should fail because the Docker container isn't rebuilt as of now with the new Pysam dependancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Official review)
Looking more in depth (please correct me if I misunderstand anything).
Major
I can't comment on the python section. However,
-
The BAM file as input into this tool should be immediately after BWA (
sorted.bam
) as it should include all possible mapped reads, not just ones that mapped exactly. i.e. if you use the reads post-samtools filter -q 37
, the discarded mapped reads would not be filtered by the new module, because they are removed from the mapped BAM file after thesamtools filter
. -
Is L921 flipped? Shouldn't it be
if (params.singleEnd) {
- as you only indicate a single output file in that conditional block?
Minor
For the process name (L906), flag itself (L363, L218) and help message (L108), I suggest the following extra precision instead of -unmap
:
--strip_input_fastq Create pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)
publishDir should renamed to something like /samtools/stripped_fastq
rather than unmapped_fastq
, as the latter file already exists with the --extract_unmapped
functionality.
Equally, the output FASTQ names should be e.g. stripped.fwd.fq.gz
and stripped.rev.fq.gz
or something. I think R1
/R2
would be dangerous for novices re-analyzing the data with the same pipeline, if they forget to add underscores (i.e. there are two R1
/R2
s in the name). This may then lead to funky input regex for a EAGER2 re-run and subsequent errors.
The actual names/flags can be discussed here - you don't have to go exactly with my 'stripped' suggestion.
Followed the suggestions of @jfy133 |
@ivelsko owes you a beer ;) |
@apeltzer Can you have a look and merge if ok ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good 👍
Just failing because of missing |
To answer #187
Comes with a python script (using pysam) to recreate fastq files from the unmapped reads.
However this is using Python3 and the Python in the conda env is currently 2.7 :(