Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in the samblaster module #1176

Merged
merged 1 commit into from
May 4, 2020
Merged

Conversation

haizi-zh
Copy link
Contributor

@haizi-zh haizi-zh commented May 1, 2020

The samblaster module tries to capture samblaster log files, and retrieve the mark duplicates statistics via regex pattern matching. However, sometimes the regex fails to capture certain samblaster logs, such as the example below:

(generated by samblaster 0.1.25 processing bam files aligned by bwa mem)

samblaster: Version 0.1.25(3.2 KiB/s) with 1 file(s) remaining
samblaster: Inputting from stdin
samblaster: Outputting to stdout
samblaster: Loaded 84 header sequence entries.
samblaster:
samblaster: Pair Type        Type_ID_Count   %Type/All_IDs Dup_ID_Count  %Dups/Type_ID_Count  %Dups/All_Dups  %Dups/All_IDs
samblaster: ---------------------------------------------------------------------------------------------------------------
samblaster: Both Unmapped            2645         0.077              0           0.000             0.000          0.000
samblaster: Orphan/Singleton        24303         0.706           1952           8.032             0.212          0.057
samblaster: Both Mapped           3416627        99.217         920794          26.950            99.788         26.739
samblaster: Total                 3443575       100.000         922746          26.796           100.000         26.796
samblaster:
samblaster: Marked      922746 of    3443575 (26.796%) total read ids as duplicates using 49304k memory in 4.779S CPU seconds and 14S wall time.

Previously the regex is:

dups_regex = "samblaster: (Removed|Marked) (\d+) of (\d+) \((\d+.\d+)%\) read ids as duplicates"

It can't match the last line.


Many thanks to contributing to MultiQC!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things I request on pull requests (PRs).

If this PR is not a new module

The samblaster module tries to capture samblaster log files, and
retrieve the mark duplicates statistics via regex pattern matching.
However, sometimes the regex fails to capture certain samblaster logs,
such as the example below:

samblaster: Version 0.1.25(3.2 KiB/s) with 1 file(s) remaining
samblaster: Inputting from stdin
samblaster: Outputting to stdout
samblaster: Loaded 84 header sequence entries.
samblaster:
samblaster: Pair Type        Type_ID_Count   %Type/All_IDs Dup_ID_Count  %Dups/Type_ID_Count  %Dups/All_Dups  %Dups/All_IDs
samblaster: ---------------------------------------------------------------------------------------------------------------
samblaster: Both Unmapped            2645         0.077              0           0.000             0.000          0.000
samblaster: Orphan/Singleton        24303         0.706           1952           8.032             0.212          0.057
samblaster: Both Mapped           3416627        99.217         920794          26.950            99.788         26.739
samblaster: Total                 3443575       100.000         922746          26.796           100.000         26.796
samblaster:
samblaster: Marked      922746 of    3443575 (26.796%) total read ids as duplicates using 49304k memory in 4.779S CPU seconds and 14S wall time.

Previously the regex can't match the last line.
ewels added a commit to MultiQC/test-data that referenced this pull request May 4, 2020
@ewels
Copy link
Member

ewels commented May 4, 2020

Great stuff, thanks! 👍

@ewels ewels merged commit 48185ea into MultiQC:master May 4, 2020
ewels added a commit that referenced this pull request May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants