`bclconvert`: incorrect stats for single-read data #1697

ajeffs · 2022-05-31T23:38:27Z

Description of bug

bcl-convert module in v1.13.dev0 reports incorrect Lane and Sample Statistics for single-read data:

"Yield (Mb)" is shown as twice the actual yield.
The actual "Yield (Mb)" is used as "Bases (Mb) >=Q30 (PF)".
"% Bases >= Q30 (PF)" is therefore half the correct value.

It appears the bcl-convert module is incorrectly calculating the above instead of pulling the correct data directly from the bcl-convert report file (Quality_Metrics.csv etc.)?

The bcl-convert module shows correct values for paired-end data.

Example for Lane Stats below (1x51 bp; Sample Stats behaves the same):

bcl-convert:

bcl2fastq (correct values):

File that triggers the error

No response

MultiQC Error log

$ multiqc . ../../Stats -e snippy -i "OG6761 QC report" -n "OG6761-multiqc-report.html"
 /// MultiQC 🔍 | v1.13.dev0

|           multiqc | Report title: OG6761 QC report
|           multiqc | Excluding modules 'snippy'
|           multiqc | Search path : /mnt/HCS/aozan/fastq/220321_VH00684_22_AAAM322HV/bcl2fastq/OG6761/qc
|           multiqc | Search path : /mnt/HCS/aozan/fastq/220321_VH00684_22_AAAM322HV/bcl2fastq/Stats
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 152/152  
|            fastqc | Found 72 reports
|           multiqc | Compressing plot data
|           multiqc | Report      : OG6761-multiqc-report.html
|           multiqc | Data        : OG6761-multiqc-report_data
|           multiqc | MultiQC complete

ewels · 2022-06-01T10:09:24Z

Huh, that sounds odd and potentially serious. Is it just v1.13dev, not v1.12? There's been a few changes in the past months on this module - #1595 and (issue: #1563)

@andrei-seleznev / @yanick - any ideas what could be going on here?

ajeffs · 2022-06-01T19:58:57Z

Apologies, haven't used v1.12, only v1.13. I note the "Yield Mb" reported by bcl-convert module is the clusters multiplied by two times the read length, instead of the actual read length, e.g., in the provided example for Lane 1, 585*(2*51)=59670, when it should be 585*51=29835. The %>=Q30 is then half what it should be because the reported yield is twice what it should be.

…

________________________________ From: Phil Ewels ***@***.***> Sent: Wednesday, 1 June 2022 10:09 pm To: ewels/MultiQC ***@***.***> Cc: Aaron Jeffs ***@***.***>; Author ***@***.***> Subject: Re: [ewels/MultiQC] Incorrect lane and sample statistics for single-read data (BCL Convert module, v1.13.dev0) (Issue #1697) Huh, that sounds odd and potentially serious. Is it just v1.13dev, not v1.12? There's been a few changes in the past months on this module - #1595<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fewels%2FMultiQC%2Fpull%2F1595&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yKVDkYkqDx%2FqfjRNKYxhUcyStWhWSaOdwdQ8GkgCA1I%3D&reserved=0> and (issue: #1563<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fewels%2FMultiQC%2Fissues%2F1563&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5mJkZOteqljgpVtKhceNHeG0SX9U20yg8pjgKl1hstw%3D&reserved=0>) @andrei-seleznev<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fandrei-seleznev&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4aWUTO22bVUSMeF%2FLXVgel3n5UfNtoZt1w4DizfqKTM%3D&reserved=0> / @yanick<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyanick&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=D%2FAs%2FwabBL4tHjXm3K3nVyl2xFpjhLNwNccXMaSN3kU%3D&reserved=0> - any ideas what could be going on here? — Reply to this email directly, view it on GitHub<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fewels%2FMultiQC%2Fissues%2F1697%23issuecomment-1143402768&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=in8A5g15X38ja9bVFUqeMeRE%2B24f7k3tODcJjhjJZjo%3D&reserved=0>, or unsubscribe<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABFQME5Z7TYEC26Y3HELKADVM4ZGBANCNFSM5XPEOEOA&data=05%7C01%7Caaron.jeffs%40otago.ac.nz%7Cbd67b09fc98b457e674d08da43b6d66e%7C0225efc578fe4928b1579ef24809e9ba%7C0%7C0%7C637896749829338237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cnufuA7gEclRqfCG5RgOxrgWxWbZ2c7ox7Re0FJeUQk%3D&reserved=0>. You are receiving this because you authored the thread.Message ID: ***@***.***>

andrei-seleznev · 2022-06-03T00:36:39Z

Huh, that sounds odd and potentially serious. Is it just v1.13dev, not v1.12? There's been a few changes in the past months on this module - #1595 and (issue: #1563)

@andrei-seleznev / @yanick - any ideas what could be going on here?

bclconvert module simply assumes paired-end reads - line 244 bclconvert.py
return {"run_id": run_id, "read_length": int(read_length), "cluster_length": int(read_length) * 2}

I'll add some code to do a check and adjustment for single-end there and put in a PR.

…gth, this resolves MultiQC#1697

@alexomics

* Try prefixing analysis dirs * Update CHANGELOG.md * tried to add conditional execution to actions * Update CHANGELOG.md * Add dependency statement * only overwrite id when not set * changelog * always set c_id * Revert setting force_interactive flag for rich with --no-ansi * Don't force terminal escape codes for the progress bar * Extend kallisto module regex to recognize newer output I've noticed multiqc (v 1.12) didn't recognise some kallisto output (I'm using kb_python 0.26.4). Having digged a bit, it seems to not work with more recent kallisto output, i.e. this snippet taken from here: https://github.com/pachterlab/GRNP_2020/blob/daed9c2f204f1c3f6ee0e864c3db93b0baadfc8a/notebooks/FASTQ_processing/ProcessPBMC_NG.ipynb ``` [index] k-mer length: 31 [index] number of targets: 187,626 [index] number of k-mers: 108,619,921 tcmalloc: large alloc 3221225472 bytes == 0x556459b7e000 @ 0x7feac4ab5887 0x556458814ad2 0x55645880d061 0x5564587e1372 0x7feac3935bf7 0x5564587e60da [index] number of equivalence classes: 752,021 [quant] will process sample 1: A_R1.gz A_R2.gz [quant] will process sample 2: B_R1.gz B_R2.gz [quant] finding pseudoalignments for the reads ... done [quant] processed 170,526,037 reads, 98,632,205 reads pseudoaligned ``` Turns out multiqc only looks for `pair|file` but not sample. Replacing sample for file did do the trick, hence I suggest to add `sample` in the regex pattern. I haven't tested this, but it should work now. Here is the code which generates this output: https://github.com/pachterlab/kallisto/blob/83bde908c403ea4014b5092a243e5c7240f48dd5/src/ProcessReads.cpp#L235 This is the commit which introduced it (already in 2018, so not sure why this hasn't been caught yet) pachterlab/kallisto@62e9464 * Replace logger.hasHandlers() with logger.handlers There are cases where configuring logging results in logger.handlers being empty but logger.hasHandlers() returns True: MultiQC#1643 Since the block modified removes based on logger.handlers, the condition to enter the block should check logger.handlers rather than logger.hasHandlers() * Added description of changes for pull request * Document 'no_version_check' config option * Docs tweak * Fix kwargs for MultiQC plugins * New config option 'custom_table_header_config' * Run black * Update adapterRemoval.py Returns actual proportion of reads that were collapsed and discarded * Black format * Black format * BlackPython * Fix chart labels and titles * Fix chart labels and titles * Add columns to stats table Add columns with proportion of collapsed/discarded reads to the general stats table * Add Columns - Fix format * Changelog * Fixed bug when other fields also have a "-" instead of an integer. * Updated CHANGELOG * Fixed typos * Fixed format typo * Fixed format typo * Nanostat: Remove HTML escaping Jinja2 escape() function removed in jinja2 v3.10 I don't think that this escaping should be required. I can't see any effect in the report when I remove it anyway. * Changelog * Changing 0 to None * Skip fields with `-` * Pangolin 4.0 compatability Recently pangolin has been updated to version 4.0 and this changes the output CSV file - see: https://github.com/cov-lineages/pangolin/releases/tag/v4.0 This causes the module to fail in its current state as row['qc_status'] already exists and the current replacement triggers a key error by searching for row['status'] which no longer exists. Thanks to @alexomics for tracking down the issue. * Don't duplicate custom-content section descriptions. Fixed edge-case bug in custom content where a `description` that doesn't terminate in `.` gave duplicate section descriptions. * Changelog * Tidied the verbose log, added summaries for skipped search files to debug log * Allow sorting of table columns with text contents * update changelog * optimize linegraph category comparison * Somalier: division by zero in sex ploidy plot * Changelog * Add time zone * Update changelog * Fix typo in bcl2fastq.py * Handle too long and low complexity * update changelog * fix zero division error in sambamba markdup module * black formatting * update CHANGELOG.md to address MultiQC#1654 * bclconvert checks RunInfo xml if reads are singleend or pairedend and sets clusterlength appropriately. resolves MultiQC#1697 * Added CITATION.cff file for standardized citations * fixed formatting of url * fixed citation formatting * Run prettier * Fix module crashing due to missing field in report * Fix bug where module wouldn't run if all content was within a MultiQC config file Fixes MultiQC#1686 * nanostat: add check for quality scores * update CHANGELOG.md * update CHANGELOG.md * Custom content: Fix crash when 'info' isn't set Closes MultiQC#1688 * Added nix flake support * Update docs/installation.md Co-authored-by: Phil Ewels <[email protected]> * Fix zero division error * Update fastqc.py * Update fastqc.py * fix format * add change log * fix doc ref * Don't need Prettier _and_ markdown/yamllint CI * Just capture the ValueError * Rich-codex screenshot in the readme * Corrected 'outdir' flag Missing a dash for the flag to work. * Clean up clean_img_paths * Generate new screengrabs with rich-codex * Add samtools flagstat column '% Read Mapped' * update samtools flagstat changelog * Added try,except for divisions to avoid division by 0 errors * added the fixing of malt in the change log * report median read length for fastqc * add after filtering total reads to general stats table * GitHub Actions: Tweet about new releases * Bump to v1.13 for release * rich-codex screenshots: Manual only, skip git checks * Generate new screengrabs with rich-codex * Fix changelog date * Bump to v1.14dev * Custom content: Render report even if there's only general stats there See MultiQC#1756 * Bugfix: Make `config.data_format` work again * Bump minimum version of Jinja2 to `>=3.0.0` Closes MultiQC#1642 * Disable search progress bar if running with `--quiet` or `--no-ansi` Closes MultiQC#1638 * Attempt to cooerce line / scatter x-axes into floats so as not to lose labels See MultiQC#1242 * Use row 1 as x-axis labels if no sample name. Closes MultiQC#1242 * Malt: Move changelog up to new version * Merge changelog up * Use OrderedDict instead of 'placement' * Add code comment * Add CI testing for Python 3.10 and 3.11 * Fix typo * Quotes so it's 3.10 and not 3.1 * 3.11-dev * Remove 3.11-dev for Windows * Move merge markers GHA into lint workflow file * Shorter job name * Be more selective about when slow MultiQC test runs fire - Master only for push event - Don't run if only docs / markdown * Run isort * Remove py2 'from __future__ import print_function' * Add GitHub actions CI for isort * Changelog * Remove all py2 'from __future__ imports' * Tweak some imports * Changelog * added setuptools to flake * rm emtpy bcftools stats variant depths plot * moved changelog comment * adjusted PR num * fix duplicate heatmap for kraken * changelog * use None instead * First commit of pre-commit * Comment out all the tests that don't yet work * Update gene_body_coverage.py Using a normalized coverage to make genebody coverage plot ( similar to the method used by RSeQC). Us the formula 'norm_cov = ( cov - min(cov ) / ( max(cov) - min(cov) )' to compute normalized coverage * Update gene_body_coverage.py * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Move changelog entry * Test for Python 3.11 now that the official release is out * CI: Use new version of actions/checkout to avoid Node.js depreciation warning * Remove sample and chromosome before converting to int This fixes issue-1793 * Remove filtered samples from general stats table This fixes MultiQC#1780 * Update changelog * Add additional entries for qualimap when region stats present * fastp: use passed filter reads instead of after filter total reads Signed-off-by: Josh Chorlton <[email protected]> * bclconvert now handles different r1 and r2 lengths instead of assuming they are the same * updated CHANGELOG.md * update bustools * Update CHANGELOG.md * Remove changelog entry * Move changelog to entry to correct place * Fix changelog * Kraken: Improve heatmap config * Apply suggestions from code review Co-authored-by: Phil Ewels <[email protected]> * handle singleindex data * cleanup * CHANGELOG.md bclconvert fix issue link typo and note single-index paired-end data handled * Qualimap BamQC: Refactor to parse regexes per section Also: Fix randomly aggressive Snippy module parsing bug * HsMetrics: Allow custom columns in General Stats too * Replace nested loop with list comprehension when parsing output file headers * CHANGELOG * Output headers order preserved and Sample is first value * Fix ubuntu version in GitHub CI to preserve Py3.6 testing. Python 3.6, I think your days are numbered.. * add back original avg field Signed-off-by: Josh Chorlton <[email protected]> * fixes Signed-off-by: Josh Chorlton <[email protected]> * update busco colors Signed-off-by: Josh Chorlton <[email protected]> * fix: frontmatter yaml formatting issue * Update docs to use --cl-config instead of --cl_config Closes MultiQC#1825 * Update multiqc/modules/fastqc/fastqc.py Co-authored-by: Phil Ewels <[email protected]> * Update multiqc/modules/fastqc/fastqc.py Co-authored-by: Phil Ewels <[email protected]> * Update multiqc/modules/fastqc/fastqc.py Co-authored-by: Phil Ewels <[email protected]> * suggestion Signed-off-by: Josh Chorlton <[email protected]> Co-authored-by: Erik Danielsson <[email protected]> Co-authored-by: Phil Ewels <[email protected]> Co-authored-by: Ido Tamir <[email protected]> Co-authored-by: seb-mueller <[email protected]> Co-authored-by: Jonathan Oribello <[email protected]> Co-authored-by: NiemannJ <[email protected]> Co-authored-by: fgvieira <[email protected]> Co-authored-by: mattloose <[email protected]> Co-authored-by: Josh Chorlton <[email protected]> Co-authored-by: vladsaveliev <[email protected]> Co-authored-by: Sam Chorlton <> Co-authored-by: jethror1 <[email protected]> Co-authored-by: Garth Kong <[email protected]> Co-authored-by: Andrei Seleznev <[email protected]> Co-authored-by: lew2mz <[email protected]> Co-authored-by: Phil Ewels <[email protected]> Co-authored-by: phue <[email protected]> Co-authored-by: David Lewis <[email protected]> Co-authored-by: Chang Y <[email protected]> Co-authored-by: beausoleilmo <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jean Mainguy <[email protected]> Co-authored-by: aidaanva <[email protected]> Co-authored-by: SusiJo <[email protected]> Co-authored-by: Phil Ewels <[email protected]> Co-authored-by: TNalpat <[email protected]> Co-authored-by: Redmar van den Berg <[email protected]> Co-authored-by: James Fellows Yates <[email protected]> Co-authored-by: Maarten-vd-Sande <[email protected]> Co-authored-by: Adam Talbot <[email protected]> Co-authored-by: Oleh Pratsko <[email protected]> Co-authored-by: Josh Chorlton <[email protected]>

ewels added the bug: module Bug in a MultiQC module label Jun 3, 2022

ewels changed the title ~~Incorrect lane and sample statistics for single-read data (BCL Convert module, v1.13.dev0)~~ bclconvert: incorrect stats for single-read data Jun 3, 2022

andrei-seleznev added a commit to andrei-seleznev/MultiQC that referenced this issue Jun 7, 2022

bclconvert singleeand reads now have cluster length equal to read len…

2b902b2

…gth, this resolves MultiQC#1697

andrei-seleznev mentioned this issue Jun 7, 2022

Fix issue where bclconvert always assumed paired-end reads when setting cluster-length #1701

Merged

1 task

ewels closed this as completed in a6a283f Sep 9, 2022

ewels mentioned this issue Oct 4, 2022

Incorrect sequencing yield calculated by bcl convert module when using asymmetric read lengths #1774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bclconvert`: incorrect stats for single-read data #1697

`bclconvert`: incorrect stats for single-read data #1697

ajeffs commented May 31, 2022

ewels commented Jun 1, 2022

ajeffs commented Jun 1, 2022 via email

andrei-seleznev commented Jun 3, 2022

bclconvert: incorrect stats for single-read data #1697

bclconvert: incorrect stats for single-read data #1697

Comments

ajeffs commented May 31, 2022

Description of bug

File that triggers the error

MultiQC Error log

ewels commented Jun 1, 2022

ajeffs commented Jun 1, 2022 via email

andrei-seleznev commented Jun 3, 2022

`bclconvert`: incorrect stats for single-read data #1697

`bclconvert`: incorrect stats for single-read data #1697