Use read pairing information in kneaddata_bowtie2_discordant_pairs #16

wbazant · 2021-02-17T21:32:44Z

I had a problem with kneaddata_bowtie2_discordant_pairs taking increasingly more memory based on the input size. I've tracked it down to this script having to re-order the alignment information, for which it was storing IDs in a set.

I have reimplemented the script to avoid this step, by letting it assume the two paired read files are indeed paired, and making bowtie2 runs separately for each mate with --reorder. The output is then more easily interpreted by iterating through the results.

The changed script no longer works for out of order paired reads. It accommodates one mate of the pair being truncated, and there's enough smartness in the script to check that its assumption is holding true, but not more. I'm not sure how controversial it is - if kneaddata ever worked for that case, or if the previous step of Trimmomatic will fix it - and if there is any reason for a bioinformatics tool to support for out of order paired reads, but I want to explicitly note it because it's I guess a drawback.

which avoids the problem of having to pair up the reads afterwards so the memory usage doesn't depend on the input size Add more logging, turned on after the --verbose option (if not verbose, print only the counts, as before) Add --bypass-trf for unit tests that are meant to be bowtie2 only

wbazant · 2021-02-17T21:42:26Z

How the script sorts inputs into outputs or really anything major was not intended to be changed. For the sake of being thorough, here are behaviour changes that come with the change:

different order of reads in the outputs (i.e. it is no longer a pseudorandom order)
different order of creation times in the outputs, when one runs ls -l it looks
I've added --verbose to the API which turns on extra logging. The default is to log just the summary counts but I see now that I've changed the format, I'll happily change it back if you want:

# previous
02/08/2021 06:58:33 PM - kneaddata.utilities - DEBUG: b'13578393 reads; of these:\n  13578393 (100.00%) were unpaired; of these:\n    13469836 (99.20%) aligned 0 times\n    16602 (0.12%) aligned exactly 1 time\n    91955 (0.68%) aligned >1 times\n0.80% overall alignment rate\npair1_aligned : 48537\npair2_aligned : 48537\npair1_unaligned : 6363994\npair2_unaligned : 6363994\norphan1_aligned : 8875\norphan2_aligned : 2608\norphan1_unaligned : 581249\norphan2_unaligned : 160599\n'

# new
02/16/2021 04:46:29 PM - kneaddata.utilities - DEBUG: b'2021-02-16 16:45:57,331 kneaddata_bowtie2_discordant_pairs - Paired reads: 444231 both aligned, 5969951 both unaligned, 0 only pair1 aligned, 0 only pair2 aligned\n2021-02-16 16:46:24,551 kneaddata_bowtie2_discordant_pairs - Orphan reads: 59896 aligned, 690133 unaligned\n'

wbazant · 2021-07-27T14:08:21Z

A note about this fork: it errors out when given paired-end fastqs where the reads don't match between mates, with an error message like
ValueError: sam files do not match on line 5: found IDs C5D92ACXX141105:3:1101:10398:4098/1 and C5D92ACXX141105:3:1101:10374:27723/2.

I actually quite like this behaviour, and exclude those from my pipeline.

…mate

wbazant changed the title ~~Use read pairing information in kneaddata_discordant_pairs~~ Use read pairing information in kneaddata_bowtie2_discordant_pairs Feb 18, 2021

wbazant mentioned this pull request Feb 23, 2021

Description in FASTA title prevents correct reformatting #17

Open

Allow for a file less than 100 lines

379a7c8

rdemko2332 added 4 commits November 4, 2022 12:52

Adding argument for specifying if mates are equal or different

a2bae7b

Changing value error message

38a5781

Adding query_mate_separator option, no logic for finding queryId and …

fa3b217

…mate

Improving error messages

969a06f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use read pairing information in kneaddata_bowtie2_discordant_pairs #16

Use read pairing information in kneaddata_bowtie2_discordant_pairs #16

wbazant commented Feb 17, 2021 •

edited

Loading

wbazant commented Feb 17, 2021

wbazant commented Jul 27, 2021

Use read pairing information in kneaddata_bowtie2_discordant_pairs #16

Are you sure you want to change the base?

Use read pairing information in kneaddata_bowtie2_discordant_pairs #16

Conversation

wbazant commented Feb 17, 2021 • edited Loading

wbazant commented Feb 17, 2021

wbazant commented Jul 27, 2021

wbazant commented Feb 17, 2021 •

edited

Loading