Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastQ Screen plots don't work with %No Hits 0.00 #1126

Closed
FelixKrueger opened this issue Mar 11, 2020 · 17 comments
Closed

FastQ Screen plots don't work with %No Hits 0.00 #1126

FelixKrueger opened this issue Mar 11, 2020 · 17 comments
Labels
bug: core Bug in the main MultiQC code module: change
Milestone

Comments

@FelixKrueger
Copy link

Description of bug:
While experimenting with Nextflow I found that that MultiQC plots the results from FastQ Screen fine when run in standard mode, but the bars for the organism in question (here E. coli) are missing if FastQ Screen had been run in --bisulfite mode.

MultiQC Error log:


/bi/apps/multiqc/1.7/lib/python3.7/site-packages/multiqc-1.7-py3.7.egg/EGG-INFO/scripts/multiqc:229: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn('MultiQC Version {} now available!'.format(remote_version))
[WARNING]         multiqc : MultiQC Version v1.8 now available!
[INFO   ]         multiqc : This is MultiQC v1.7
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching '.'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'Ecoli_10K_methylated_R1_screen'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'Ecoli_10K_methylated_R2_screen'
[INFO   ]    fastq_screen : Found 2 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : multiqc_report.html
[INFO   ]         multiqc : Data        : multiqc_data
[INFO   ]         multiqc : MultiQC complete

File that triggers the error:
This file works fine:
Ecoli_standardMode_R1_screen.txt

This one doesn't:
Ecoli_bisulfiteMode_R1_screen.txt

Additional context
I think the issue might have to do with the fact that the file *screen.txt contains and additional section for the bisulfite strands found in the sample in some way:

#Fastq_screen version: 0.14.0	#Aligner: Bismark/bowtie2	#Reads in subset: 100000
Genome	#Reads_processed	#Unmapped	%Unmapped	#One_hit_one_genome	%One_hit_one_genome	#Multiple_hits_one_genome	%Multiple_hits_one_genome	#One_hit_multiple_genomes	%One_hit_multiple_genomes	Multiple_hits_multiple_genomes	%Multiple_hits_multiple_genomes
Cat	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Chicken	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Cow	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Drosophila	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Human	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Macaque	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Mouse	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Opossum	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Pig	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Rat	10000	9909	99.09	0	0.00	0	0.00	9	0.09	82	0.82
Zebrafish	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Arabidopsis	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Grape	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Potato	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Tomato	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Pseudomonas	10000	9987	99.87	0	0.00	0	0.00	2	0.02	11	0.11
Massilia_oculi	10000	9998	99.98	0	0.00	0	0.00	0	0.00	2	0.02
Ecoli	10000	0	-0.00	9259	92.59	579	5.79	35	0.35	127	1.27
Lambda	10000	9977	99.77	0	0.00	0	0.00	23	0.23	0	0.00
MT	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
PhiX	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00
Wasp	10000	9999	99.99	0	0.00	0	0.00	1	0.01	0	0.00
Vectors	10000	9933	99.33	0	0.00	0	0.00	67	0.67	0	0.00
Worm	10000	9990	99.90	0	0.00	0	0.00	0	0.00	10	0.10
Yeast	10000	10000	100.00	0	0.00	0	0.00	0	0.00	0	0.00

%Hit_no_genomes: 0.00


#Bisulfite read orientation results
Library	#Original_top_strand	%Original_top_strand	#Complementary_to_original_top_strand	%Complementary_to_original_top_strand	#Complementary_to_original_bottom_strand	%Complementary_to_original_bottom_strand	#Original_bottom_strand	%Original_bottom_strand
Rat	3	33.33	3	33.33	2	22.22	1	11.11
Pseudomonas	0	0.00	2	100.00	0	0.00	0	0.00
Ecoli	2335	25.12	2324	25.01	2350	25.29	2285	24.59
Lambda	2	8.70	12	52.17	5	21.74	4	17.39
Wasp	0	0.00	0	0.00	0	0.00	1	100.00
Vectors	12	17.91	26	38.81	14	20.90	15	22.39

MultiQC doesn't seem to be able to extract the %Hit_no_genomes: 0.00 correctly. I did a few experiments by deleting the last section and ending in %Hit_no_genomes: 0.00 like in the standardMode files, but it still fails. If you change the number from 0.00 to any other number, the plot for E. coli and NoHits starts showing, suggesting that it doesn't like the value of 0.00? Sometimes it assigned values to mouse though, so something weird is going on...

Thanks for looking into this.

@FelixKrueger
Copy link
Author

As it happens, I just found the same in an unrelated RNA-seq experiment:

/bi/apps/multiqc/1.7/lib/python3.7/site-packages/multiqc-1.7-py3.7.egg/EGG-INFO/scripts/multiqc:229: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn('MultiQC Version {} now available!'.format(remote_version))
[WARNING]         multiqc : MultiQC Version v1.8 now available!
[INFO   ]         multiqc : This is MultiQC v1.7
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching '.'
Searching 1154 files..  [####################################]  100%
[INFO   ]         bowtie2 : Found 50 reports
[INFO   ]        cutadapt : Found 99 reports
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'lane7267_ATGCGCAG_TTCTAGCT_Smarca5_B06_L001_R1_screen'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'lane7267_TGCAGCTA_GCGTAAGA_Smarca5_D11_L001_R1_screen'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'lane7267_ACTGAGCG_CCTAGAGT_Smarca5_C08_L001_R1_screen'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'lane7267_ACTCGCTA_AAGGCTAT_Smarca5_F01_L001_R4_screen'
[INFO   ]    fastq_screen : Found 50 reports
[ERROR  ]    fastq_screen : No counts found for 'No hits' ('lane7267_ACTCGCTA_AAGGCTAT_Smarca5_F01_L001_R4_screen'). Could be malformed or very old FastQ Screen results.
[ERROR  ]    fastq_screen : No counts found for 'No hits' ('lane7267_ACTGAGCG_CCTAGAGT_Smarca5_C08_L001_R1_screen'). Could be malformed or very old FastQ Screen results.
[ERROR  ]    fastq_screen : No counts found for 'No hits' ('lane7267_ATGCGCAG_TTCTAGCT_Smarca5_B06_L001_R1_screen'). Could be malformed or very old FastQ Screen results.
[ERROR  ]    fastq_screen : No counts found for 'No hits' ('lane7267_TGCAGCTA_GCGTAAGA_Smarca5_D11_L001_R1_screen'). Could be malformed or very old FastQ Screen results.
[INFO   ]          fastqc : Found 197 reports
[INFO   ]         multiqc : Compressing plot data
[WARNING]         multiqc : Previous MultiQC output found! Adjusting filenames..
[WARNING]         multiqc : Use -f or --force to overwrite existing reports instead
[INFO   ]         multiqc : Report      : multiqc_report_1.html
[INFO   ]         multiqc : Data        : multiqc_data_1
[INFO   ]         multiqc : MultiQC complete

I think MultiQC is not treating the value of 0.00% no hits the right way. I'll change the name of the issue accordingly.

@FelixKrueger FelixKrueger changed the title FastQ Screen plots don't work in bisulfite mode FastQ Screen plots don't work with %No Hits 0.00 Mar 11, 2020
@ewels
Copy link
Member

ewels commented Mar 11, 2020

Good spot!

Could you please pop some example data up to https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/fastq_screen as a PR? That would be fab ✨

@FelixKrueger
Copy link
Author

Are you challenging my Github skills again? Will try to make it work!

@ewels
Copy link
Member

ewels commented Mar 11, 2020

Yeah, looks like the current code uses a mahoosive regular expression:
https://github.com/ewels/MultiQC/blob/063ac69e28d7b109cc9165d47c0d3f9de3f21e13/multiqc/modules/fastq_screen/fastq_screen.py#L71

This is pretty ugly and should really be rewritten..

@ewels
Copy link
Member

ewels commented Mar 29, 2020

Just noticed that you have a new warning from logger that I've never seen before 🤦‍♂

DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn('MultiQC Version {} now available!'.format(remote_version))

@ewels ewels added this to the MultiQC v1.9 milestone Mar 29, 2020
@ewels ewels added the fix label Mar 29, 2020
@ewels ewels added bug: core Bug in the main MultiQC code and removed fix labels May 28, 2020
@ewels
Copy link
Member

ewels commented May 28, 2020

Ok, problem (1) solved - these warnings are now gone:

[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'Ecoli_10K_methylated_R1_screen'
[WARNING]    fastq_screen : Couldn't find number of reads with no hits for 'Ecoli_10K_methylated_R2_screen'

This was simply that the test was checking if nohits_pct, which returned false if the percentage was zero (giving the warning). I changed to the more explicit if nohits_pct is not None and now that chunk of code is treated properly and the warning is gone (and the plot contains the "No Hits" category again).

The first Bisulfite issue thing is separate though I think. The plot doesn't show any alignment to E.Coli even though in the report it should be. Looking into that now.

Phil

Phil

ewels added a commit that referenced this issue May 28, 2020
Now handles values in a more sensible way instead of using a huge regex.

Fixes error in #1126
@ewels
Copy link
Member

ewels commented May 28, 2020

I tracked down the error - it was because E.Coli had -0.00% of reads unmapped in your Bisulfite example. The - symbol threw the regex so the parser simply skipped that line.

I didn't really like the huge regex, so I have completely rewritten the parsing code to instead loop through the columns and dynamically take the header fields. This is a bit more generic and hopefully less vulnerable to these kinds of hiccups.

Phil

@FelixKrueger
Copy link
Author

Great thanks. So maybe @StevenWingett should also look into why FastQ Screen comes up with -0.00% in the first place?

@ewels
Copy link
Member

ewels commented May 28, 2020

Maybe 😉

I'm feeling nice this morning and was thinking of adding a plot for you if there is Bisulfite data in the FastQ Screen report.

Does FastQ Screen itself plot this data somehow? Could you attach the output here so that I can mimic it if so?

Phil

@FelixKrueger
Copy link
Author

Definitely need to make use of your good mood this morning. FastQ Screen produces something like this:
Ecoli_10K_methylated_R1_val_1_screen.html.zip
Thanks!

@FelixKrueger
Copy link
Author

It could be nice to scale the plot so that it doesn't stretch over four A4 pages and requires a lot of scrolling :)

ewels added a commit that referenced this issue May 28, 2020
@ewels
Copy link
Member

ewels commented May 28, 2020

It could be nice to scale the plot so that it doesn't stretch over four A4 pages and requires a lot of scrolling :)

Come on now, give me some credit 😉 I made it so that the bar plot is tabbed, with tabs sorted by the total read count across all samples (hidden if 0). The y-axis is kept consistent across all species:

fastqc_screen_bisulfite

I have also updated the main plot so that (a) the fancy version doesn't show if you have a mixture of different genomes (fixes previously unreported bug) and (b) zero-count genomes are hidden and (c) genomes are sorted by the total read count across all samples.

See an example report: multiqc_report.html

@FelixKrueger
Copy link
Author

Looking great! Thanks a lot!

@ewels
Copy link
Member

ewels commented May 28, 2020

There was also an annoying bug where the ymax was maintained even when you clicked "Percentages". That's now fixed, so the plot should work as you expect.

@ewels ewels closed this as completed May 28, 2020
ewels added a commit to MultiQC/test-data that referenced this issue May 28, 2020
@StevenWingett
Copy link

Hello everyone. Since everyone is looking at FastQ Screen, I thought I should join in. @FelixKrueger, could you point to the file that is generating -0.00. Thanks :-)

@FelixKrueger
Copy link
Author

To be honest I don't know Steven, it is somewhere on the cluster... You were looking over my shoulder when we generated this report, don't you remember its location?...

@StevenWingett
Copy link

@FelixKrueger No, I thought it was rude to look at your screen. I'll dig around.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: core Bug in the main MultiQC code module: change
Projects
None yet
Development

No branches or pull requests

3 participants