Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename BAM-specific flags #76

Closed
jfy133 opened this issue Nov 4, 2018 · 7 comments
Closed

Rename BAM-specific flags #76

jfy133 opened this issue Nov 4, 2018 · 7 comments

Comments

@jfy133
Copy link
Member

jfy133 commented Nov 4, 2018

Is your feature request related to a problem? Please describe.
The flag --bam_keep_mapped_only may be misleading. While the help message says Only consider mapped reads for downstream analysis. Unmapped reads are extracted to separate output., the actual flag may suggest to the user that by 'keeping', unmapped reads would be discarded completely.

The flag --bam_filter_reads is also confusing. While the help message says Keep all reads in BAM file for downstream analysis the flag suggests something is being filtered (either specific reads retained, or filtered out).

Describe the solution you'd like
Perhaps rename the --bam_keep_mapped_only flag to be more specific such as --bam_analyse_mapped_only

Perhaps rename --bam_filter_reads to --bam_retain_all_reads. Optionally, change function of flag and name to e.g.: ``--bam_discard_unmapped`.

@apeltzer
Copy link
Member

apeltzer commented Nov 4, 2018

I agree here, should rename these more appropriately...

@apeltzer
Copy link
Member

apeltzer commented Nov 5, 2018

Would you mind having a look at this PR @jfy133 ?

@jfy133
Copy link
Member Author

jfy133 commented Nov 9, 2018

To continue discussion of the PR:

I think we've been getting our wires crossed slightly. To re-conceptualise this step - the main question is what do we want to do with the unmapped reads. So I think the BAM related flags after mapping should always refer to that - as AFAIK there would be no case where we wouldn't downstream analyse mapped reads (or at least shouldn't).

So what do you think of the following scenarios:

1) Keep mapped/unmapped together
--bam_retain_unmapped keep mapped and unmapped reads in the output BAM file (default). [pipeline description: leave output of samtools view on bwa * output SAM file as is]

2) Separate mapped/unmapped
--bam_separate_unmapped separate mapped and unmapped reads into independent BAM files. Downstream analysis performed on mapped reads only. [pipeline description: perform samtools view on bwa * output SAM file twice, once with -F 4 flag for mapped reads and once with -f 4 to for unmapped reads, each redirected into independent BAM files ]

3) Discard mapped/unmapped
--bam_discard_unmapped discard unmapped reads. Downstream analysis performed on mapped reads only. [pipeline description: perform samtools view on bwa * output SAM file once with -F 4 flag for mapped reads]

Misc: Discard mapped/unmapped optionally convert unmapped BAM to FASTQ
I would also propose an extra flag, or alternatively options to --bam_separate_unmapped:

--unmapped_to_fastq converts unmapped read BAM file to FASTQ for independent downstream
analysis (e.g. metagenomic analysis)

OR

--bam_separate_unmapped separate mapped and unmapped reads into independent BAM files, with unmapped reads in either BAM or FASTQ format. Downstream analysis performed on mapped reads only. Values: BAM, FASTQ [pipeline description: perform samtools view on bwa * output SAM file twice, once with -F 4 flag for mapped reads and once with -f 4 to for unmapped reads, each redirected into independent BAM files. If Value FASTQ perform samtools fastq on output BAM of samtools view -f 4 into a gzipped FASTQ file. Remove corresponding unmapped BAM ]

@EisenRa
Copy link

EisenRa commented Nov 11, 2018

This sounds great!

I would second the option for converting BAM to FASTQ, as it's one less step to do post-pipeline.

@apeltzer
Copy link
Member

Yes, absolutely! I will make this possible exactly as proposed in #76 (comment) !

@apeltzer
Copy link
Member

I Implemented 1-3 in the exact same way.

Furthermore, I added an optional step for users to set --bam_unmapped_keep_type with options bam,fastq.gz to either keep or remove the corresponding files. If you speciy keep bam, then the fastq won#t be extracted, otherwise the fastq will be created for unmapped reads and the bam is deleted :-)

@apeltzer apeltzer added this to the V2.1 "Ulm" milestone Nov 19, 2018
apeltzer added a commit that referenced this issue Nov 19, 2018
Adjustments for #76 , rename certain options to be more explicit
@jfy133
Copy link
Member Author

jfy133 commented Dec 9, 2018

Going into testing now with #97 . will close for now.

@jfy133 jfy133 closed this as completed Dec 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants