Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heatmap by Falco #8

Closed
sum732 opened this issue Aug 20, 2020 · 2 comments
Closed

Heatmap by Falco #8

sum732 opened this issue Aug 20, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@sum732
Copy link

sum732 commented Aug 20, 2020

Hello,

Thanks for writing Falco. Really liked the processing speed, 515M reads in 45 min. FastQC is still processing.
The results are comparable. Also it the dynamic nature of plots is good. However, I am not sure about heatmap. In fastQC, a "cold map" is shown. It's much easier to figure out issues. In the heatmap, what is the spectrum based on? How to best interpret heatmap? Additionally, the tiles on heatmaps is having a different range from fastQC, how are tiles captured and the range is set?

I am using version 0.11.9 for fastQC and v0.2.1 of Falco (via conda).

Thanks in advance with this.

@guilhermesena1 guilhermesena1 added the enhancement New feature or request label Aug 23, 2020
@guilhermesena1
Copy link
Collaborator

guilhermesena1 commented Aug 23, 2020

Hello,

Thank you for the kind words about falco!
The heatmap values are generated similarly to FastQC. For each tile, we calculate the average quality of all bases, and for each position within a tile, we calculate the z-score of the average quality within that position to the global tile average (i.e. number of standard deviations above or below average). If there is a problem with a specific position, the z-score value will be very small. Within any tile, if z < -5, the module outputs a warning, and if z < -10, the module outputs an error.

The legend on the right side displays the association between colors and values, which is different for each dataset, and based on the z-score distributions. The red tiles are larger values, and the blue tiles are the smaller values (to look out for). I think I do understand your concern and the issue with the heatmaps though. Whenever there is an outlier with a very low z-score, all the other tiles have their colors "bumped up" to red. I think the right solution is for there to be an absolute color distribution (say, between +10 and -10), and every tile abide to this scheme. I will implement this change in the next version of falco, and really appreaciate you bringing this up.

For further reference, the FastQC documentation explains these values in detail:
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/12%20Per%20Tile%20Sequence%20Quality.html

I hope that helps!

@sum732
Copy link
Author

sum732 commented Aug 24, 2020

Great, thanks for taking time to explain and expand on the heatmap.
Yes, I agree with the approach to the color distribution so that in a quick glance it's easy to identify any issues. Looking forward to the update.
Best,

@sum732 sum732 closed this as completed Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants