Review MultiQC Custom Content in Main Table #90

apeltzer · 2018-11-19T07:06:33Z

I'd like to have a better and improved report in the end, summarizing certain important metrics in a smart way, e.g. but not limited to. Naming is something I'd not like to standardize too much since different labs tend to have different names for certain metrics.

"Cluster Factor": (Number of duplicate reads / Total number of reads in sample)
- Ideally, this would be summarized based on information retrieved from some modules running in the pipeline.
"Target Efficiency": (Captured bases / targetted bases)

...

Could you please add things that are of specific interest to you?
These here are on my ToDo list:

mt/Nuc ratio
mapped reads / % mapped reads
merged reads / % merged reads
GC content

Schmutzi / estimates for contamination
....

Any ideas what you'd like to see here @EisenRa @jfy133 @JudithNeukamm ?

EisenRa · 2018-11-25T07:18:20Z

@apeltzer looking awesome so far!

Things that I think would be great are:

DamageProfiler: both C->T/G->A, and fragment length distribution.
In the 'General Statistics' section, % of genome with coverage >= 2, 3, 4. (Currently there is 1, 5, 10, etc.)
Qualimap mapping scores

jfy133 · 2018-12-10T10:50:48Z

Not so much a metric, but have 'number of reads after quality filtering' column, when that flag is initiated.

apeltzer · 2018-12-10T12:06:28Z

To summarize a bit what Phil said, I guess we'll need a custom plugin for MultiQC to provide some of these here.

So we should create a list of required metrics and how to compute them and have these available then.

jfy133 · 2018-12-13T20:31:40Z

For reference, this are some of the typical EAGER 1 headers. I will update definitions (as I understand them) over the next couple of days.

Sample Name: Input sample name
# of Raw Reads prior Clip & Merge (C&M): total number of raw reads in all FASTQ files
#' reads after C&M prior mapping: total number of reads in input file for mapping after adapter removal (including either merged only, or all merged/high quality singletons as requested). From AdapterRemoval.settings file.
# of Merged Reads: Number of reads that were merged - not including clipped singletons (comment: I think redundant with below) From AdapterRemoval.settings file.
% Merged Reads: Percentage of reads that were merged. Calculated from AdapterRemoval.settings file.
# reads not attempted to map: (comment: I was never really sure what this was)
# mapped reads prior RMDup: Number of reads that mapped to the reference file, prior remove duplicate step. From samtools stats.
#' of Duplicates removed: Number of exact duplicate reads removed (e.g. PCR duplicates), where either the sequence was the same in two or more reads with the same start (samtools markduplicates) and end coordinate on the reference (DeDup). From deduplication tool log.
Mapped Reads after RMDup Number of reads after exact duplicate reads removal (e.g. PCR duplicates). From DeDup Log
Endogenous DNA (%) Percentage of on-target mapped and de-duplicated reads over the total number of reads in the library after adapter removal and merging (on and off-target). From AdapterRemoval and samtools logs and manual calculation.
Cluster Factor Ratio of duplicated reads over de-duplicated reads. Calculated as Number of reads post-RMDup over number of reads pre-RMDup. Higher values suggest low-complexity or over-amplified library. From DeDup log and manual calculation.
Mean Coverage The average number of reads covering each base across the entire reference genome after deduplication. Also known as depth or fold coverage. Note does not measure evenness of coverage. From qualimap output.
std. dev. Coverage The standard deviation of the average number of reads covering each base across the reference genome after de-duplication. From qualimap output.
Coverage >= 1X in % Percentage of bases covered at least one time across the whole genome. Also known as breadth coverage after deduplication. From qualimap output.
Coverage >= 2X in % Percentage of bases covered at least two times across the whole genome. Also known as breadth coverage after deduplication. From qualimap output.
Coverage >= 3X in % Percentage of bases covered at least two times across the whole genome. Also known as breadth coverage after deduplication. From qualimap output.
Coverage >= 4X in % Percentage of bases covered at least two times across the whole genome. Also known as breadth coverage after deduplication. From qualimap output.
Coverage >= 5X in % Percentage of bases covered at least two times across the whole genome. Also known as breadth coverage after deduplication. From qualimap output.
'#' of reads on mitochondrium Number of reads that aligned to the 'mitochondrium' or selected 'chromosome' or fasta entry in the reference genome. From samtools stats [Or qualitmap?].
AVG Coverage on mitochondrium Average number of reads covering each base across the 'mitochondrium' or selected 'chromosome' or fasta entry in the reference genome after deduplication
MT/NUC Ratio The ratio between number of reads algined to the mitochondrium or selected 'chormosome' or FASTA entry in the reference genome to the number of reads aligned to all other chromosome or FASTA entry . From samtools stats and manual calculation.
DMG 1st Base 3' Frequency of G->A substitutions from reference at the 1st base of the 3' end of each de-duplicated read. This is represents typical ancient DNA damage from deamination. Higher the indicates more damage, if higher than 2nd base. From DamageProfiler log.
DMG 2nd Base 3' Frequency of G->A substitutions from reference at the 2nd base of the 3' end of each de-duplicated read. This is represents typical ancient DNA damage from deamination. Higher the indicates more damage, if higher than 2nd base. From DamageProfiler log.
DMG 1st Base 5' Frequency of C->T substitutions from reference at the 1st base of the 5' end of each de-duplicated read. This is represents typical ancient DNA damage from deamination. Higher the indicates more damage, if higher than 2nd base. From DamageProfiler log.
DMG 2nd Base 5' Frequency of C-T substitutions from reference at the 2nd base of the 5' end of each de-duplicated read. This is represents typical ancient DNA damage from deamination. Higher the indicates more damage, if higher than 2nd base. From DamageProfiler log.
average fragment length Average median length in base pairs of all aligned reads after de-duplication. From DamageProfiler log and manual calculaton.
median fragment length Median median length in base pairs of all aligned reads after de-duplication. From DamageProfiler log and manual calculation.
GC content in % Average GC content of the genome. From qualimap output.

jfy133 · 2019-01-09T20:15:58Z

Additional stuff from EAGER1 modules which wasn't in original ReportTable

Genotyping stuff maybe from GATK? I don't know would be useful there or if there is a fast stats scanner (e.g. maybe from bcftools stats?) -> nice for pathogen stuff would be number of multi-allelic sites ("number of multiallelic sites" field in output)

Additional stuff which has been added to EAGER2 already
FastP

# reads after PolyG trimming Number of reads that were trimmed of poly-G tails. From fastp html report. [Might need to consider whether to also calculate this after C&M which has a length filter...)

jfy133 · 2019-02-15T14:17:40Z

I can potentially supply an R/tidyverse script that converts a MultiQC Json to a nice table however a couple of things to note:

I will still need a way to get a list of the IDs of things when I need to collapse stats across multi lane samples
Apparently MultiQC Json isn't meeting JSON standards due to some of the values of the module stats files being varied (i.e. NA vs NaN vs null etc), so I have to do some hacky things to clean that up. Might be worth speaking to MultiQC crew to standardise that somehow (which would require per-module reporting of what each 'missing data' value standards for)
Not sure how I will deal with DeDup stats of multi-lane things.

Point two references:
rstudio/DT#496
jeroen/jsonlite#70
jeroen/jsonlite#94

jfy133 · 2019-04-01T18:46:58Z

Extra values for the polyG to consider is effect on GC content, but we need to decide how to display this (i.e. which modules to get this info from).

jfy133 · 2019-04-01T19:26:34Z

MultiQC ReportTable Requests

In all cases: either adapterremoval_config functionality (a la Qualimap),
or additional columns exported to GeneralStats which are already in
multiqc_adapter_removal.txt

Adapter Removal

total
full-length_cp
truncated_cp
retained_reads

Samtools

mapped_passed_pct
quality filtered info?

Qualimap

mean_coverage

apeltzer · 2019-12-04T10:15:25Z

Need to update this according to requirements we have - there have been a lot of updates on this one recently, e.g. supprt for mtnucratio in MultiQC, mean coverage updates in MultiQC using QualiMap.

jfy133 · 2019-12-05T15:53:13Z

Most needed stuff is there, but no endogenous DNA. Maybe a small python script required and a MultiQC module? @aidaanva would you be interested?

It would just need to take a single samtools flagstat file and divide the mapped field (5th row) by the total (1st) and spit out a percent.

aidaanva · 2019-12-10T09:45:56Z

I can do it

apeltzer · 2019-12-10T09:57:13Z

Your script could actually run in the process multiqc step and simply divide the output as requested, then creating some custom content for multiqc itself :-) https://github.com/ewels/MultiQC/blob/master/docs/custom_content.md

jfy133 · 2020-02-29T04:43:22Z

Endogenous DNA added. Will close for for 2.1, but can reopen in the future for next MultiQC Release

apeltzer added this to the V2.1 "Ulm" milestone Nov 19, 2018

jfy133 added the major label Dec 9, 2018

jfy133 changed the title ~~MultiQC Custom Content in Main Table~~ Review MultiQC Custom Content in Main Table Dec 4, 2019

jfy133 self-assigned this Dec 4, 2019

jfy133 assigned aidaanva Dec 10, 2019

jfy133 closed this as completed Feb 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review MultiQC Custom Content in Main Table #90

Review MultiQC Custom Content in Main Table #90

apeltzer commented Nov 19, 2018

EisenRa commented Nov 25, 2018

jfy133 commented Dec 10, 2018 •

edited

Loading

apeltzer commented Dec 10, 2018

jfy133 commented Dec 13, 2018 •

edited

Loading

jfy133 commented Jan 9, 2019 •

edited

Loading

jfy133 commented Feb 15, 2019 •

edited

Loading

jfy133 commented Apr 1, 2019

jfy133 commented Apr 1, 2019

apeltzer commented Dec 4, 2019

jfy133 commented Dec 5, 2019 •

edited

Loading

aidaanva commented Dec 10, 2019

apeltzer commented Dec 10, 2019

jfy133 commented Feb 29, 2020

Review MultiQC Custom Content in Main Table #90

Review MultiQC Custom Content in Main Table #90

Comments

apeltzer commented Nov 19, 2018

EisenRa commented Nov 25, 2018

jfy133 commented Dec 10, 2018 • edited Loading

apeltzer commented Dec 10, 2018

jfy133 commented Dec 13, 2018 • edited Loading

jfy133 commented Jan 9, 2019 • edited Loading

jfy133 commented Feb 15, 2019 • edited Loading

jfy133 commented Apr 1, 2019

jfy133 commented Apr 1, 2019

MultiQC ReportTable Requests

Adapter Removal

Samtools

Qualimap

apeltzer commented Dec 4, 2019

jfy133 commented Dec 5, 2019 • edited Loading

aidaanva commented Dec 10, 2019

apeltzer commented Dec 10, 2019

jfy133 commented Feb 29, 2020

jfy133 commented Dec 10, 2018 •

edited

Loading

jfy133 commented Dec 13, 2018 •

edited

Loading

jfy133 commented Jan 9, 2019 •

edited

Loading

jfy133 commented Feb 15, 2019 •

edited

Loading

jfy133 commented Dec 5, 2019 •

edited

Loading