-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
report median read length for fastqc #1745
Conversation
3efaad6
to
b99d9c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wrote this code back before long reads were really a thing and it was more of a fair assumption 😅
I'm a bit nervous about the breaking change as this is by far the most used module. As such, I'd prefer to keep the previous average and add this in as an additional value if possible. At least then the same value as before will be present in the data exports for downstream analysis.
Also, I'd like the column title to be slightly different, eg. Median Read Length
so that there's a visible difference between reports.
Signed-off-by: Josh Chorlton <[email protected]>
Signed-off-by: Josh Chorlton <[email protected]>
multiqc/modules/fastqc/fastqc.py
Outdated
try: | ||
hide_seq_length = False if max(seq_lengths) - min(seq_lengths) > 10 else True | ||
hide_seq_length = False if max(avg_seq_lengths) - min(avg_seq_lengths) > 10 else True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hide_seq_length = False if max(avg_seq_lengths) - min(avg_seq_lengths) > 10 else True | |
hide_seq_length = False if max(median_seq_lengths) - min(median_seq_lengths) > 10 else True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really just be:
hide_seq_length = max(median_seq_lengths) - min(median_seq_lengths) <= 10
Co-authored-by: Phil Ewels <[email protected]>
Co-authored-by: Phil Ewels <[email protected]>
Co-authored-by: Phil Ewels <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for the tweaks!
Uses median read length for fastqc, instead of mean. Note that this is a breaking change. Another option is to add an extra column, to keep both.
The median is way more suitable for sequencing reads as it doesn't assume the length is normally distributed.
There are issues on fastp to output more stats, but they don't seem to have much traction. I suppose I could also try to PR them, but they seem less responsive.
You can see a subtle difference in lengths on the testdata repo:
Old:
New:
CHANGELOG.md
has been updated