Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report median read length for fastqc #1745

Merged
merged 10 commits into from
Dec 18, 2022
Merged

Conversation

jchorl
Copy link
Contributor

@jchorl jchorl commented Aug 14, 2022

Uses median read length for fastqc, instead of mean. Note that this is a breaking change. Another option is to add an extra column, to keep both.

The median is way more suitable for sequencing reads as it doesn't assume the length is normally distributed.

There are issues on fastp to output more stats, but they don't seem to have much traction. I suppose I could also try to PR them, but they seem less responsive.

You can see a subtle difference in lengths on the testdata repo:
Old:
Screenshot 2022-08-14 at 23-02-01 MultiQC Report

New:
Screenshot 2022-08-14 at 22-59-41 MultiQC Report

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wrote this code back before long reads were really a thing and it was more of a fair assumption 😅

I'm a bit nervous about the breaking change as this is by far the most used module. As such, I'd prefer to keep the previous average and add this in as an additional value if possible. At least then the same value as before will be present in the data exports for downstream analysis.

Also, I'd like the column title to be slightly different, eg. Median Read Length so that there's a visible difference between reports.

@jchorl
Copy link
Contributor Author

jchorl commented Dec 13, 2022

Done! There is now an optional median column (hidden by default):
Screenshot 2022-12-13 at 05-15-32 MultiQC Report

@ewels ewels self-requested a review December 15, 2022 16:07
multiqc/modules/fastqc/fastqc.py Outdated Show resolved Hide resolved
multiqc/modules/fastqc/fastqc.py Outdated Show resolved Hide resolved
try:
hide_seq_length = False if max(seq_lengths) - min(seq_lengths) > 10 else True
hide_seq_length = False if max(avg_seq_lengths) - min(avg_seq_lengths) > 10 else True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hide_seq_length = False if max(avg_seq_lengths) - min(avg_seq_lengths) > 10 else True
hide_seq_length = False if max(median_seq_lengths) - min(median_seq_lengths) > 10 else True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really just be:

hide_seq_length = max(median_seq_lengths) - min(median_seq_lengths) <= 10

multiqc/modules/fastqc/fastqc.py Outdated Show resolved Hide resolved
@jchorl
Copy link
Contributor Author

jchorl commented Dec 16, 2022

Suggestions incorporated and tested:
Screenshot 2022-12-16 at 21-46-50 MultiQC Report

Will cut an issue shortly on fastqc

@ewels ewels self-requested a review December 18, 2022 20:24
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the tweaks!

@ewels ewels merged commit 6dd4d83 into MultiQC:master Dec 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants