RSeQC TIN module - sample name cleaning #1484

ewels · 2021-07-06T05:36:58Z

Originally posted by @guidohooiveld in #737 (comment)

Thanks Phil and Erik for creating the MultiQC RSeQC TIN submodule; much appreciated!

Earlier today I updated MutiQC to the latest development version, and ran it again on a map containing various QC output files, including TIN.
Et voila, the 2 columns (of which one is hidden) were indeed added to the General Statistics table. Nice & thanks!

One comment/question, though, regarding the sample names used for the TIN values in the General Statistics table: these are not the same as used for the other RSeQC modules. This makes the table 'less nice' and more difficult to read. See 1st screenshot below.

I think this is due to the fact that within the TIN "summary" file (the txt file *out.summary.txt) the full name of the BAM file is returned (used) by RSeQC (see its copied content below), which is then extracted (parsed) by the MultiQC TIN module, and subsequently used in the General Statistics table.

Therefore: would you have any suggestion to prevent this form happening? So that only the 'base name' is used in the table? Maybe by somehow using on-the-fly the function fn_clean_sample_names?
Note that I am not an expert on how to do this and it may be a too naive thought... but since the 'other' files seem to be correctly recognized and name cleaned (see 2nd screenshot), this may be feasible.

Thus, in summary: in the General Statistics table the full name present in the TIN summary file (*out.summary.txt) is used (e.g. "P26-1-6h_Aligned.sortedByCoord.out.bam"), whereas just the use of only the sample ID (base name) "P26-1-6h" would be preferred.

Content TIN summary file (P26-1-6h_Aligned.sortedByCoord.out.summary.txt):

Bam_file	TIN(mean)	TIN(median)	TIN(stdev)
P26-1-6h_Aligned.sortedByCoord.out.bam	53.72327495737302	53.34221052273402	18.530355596890026

An example file is present in my previous post in this thread (#737 (comment)).

Below a screenshot of a map containing for a sample the output of STAR, but also RSeQC and Picard. All relevant files are nicely recognized by MultiQC, and their names are properly 'cleaned' when used in the MultiQC report. Hence my (naive) thought above...

The text was updated successfully, but these errors were encountered:

ewels · 2021-07-06T17:44:16Z

@guidohooiveld - fixed in v1.12dev. If you install the dev version it should now work for you as you expect.

Sorry for the inconvenience!

Phil

guidohooiveld · 2021-07-06T21:05:51Z

Thanks Phil for the prompt action taken! It is indeed working now as expected.

ewels added bug: module Bug in a MultiQC module priority: high labels Jul 6, 2021

ewels mentioned this issue Jul 6, 2021

RSeqQC: parse Transcript Integrity Number #737

Closed

ewels closed this as completed in ed61446 Jul 6, 2021

drpatelh mentioned this issue Dec 17, 2021

Add tin.py output to MultiQC report nf-core/rnaseq#746

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RSeQC TIN module - sample name cleaning #1484

RSeQC TIN module - sample name cleaning #1484

ewels commented Jul 6, 2021 •

edited

Loading

ewels commented Jul 6, 2021

guidohooiveld commented Jul 6, 2021 •

edited

Loading

RSeQC TIN module - sample name cleaning #1484

RSeQC TIN module - sample name cleaning #1484

Comments

ewels commented Jul 6, 2021 • edited Loading

ewels commented Jul 6, 2021

guidohooiveld commented Jul 6, 2021 • edited Loading

ewels commented Jul 6, 2021 •

edited

Loading

guidohooiveld commented Jul 6, 2021 •

edited

Loading