Usage of indexed files #31

famosab · 2024-11-18T08:17:16Z

While working on the nf-core modules I encountered two things which I could not determine from the docs:

Does the tool use the indexes of the sorted bam files if they are present? And does MuSE expect the index to be passed alongside the bam file?
What exactly is meant by "faidx indexed reference genome file"? When I tested the tool it seemed to me that just the genome fasta file was sufficient.

jiyunmaths · 2024-11-27T14:54:42Z

Hi @famosab, Yes, both MuSE 1 and MuSE 2 utilize the index files of BAM files to efficiently access reads at specific genomic locations. If these index files are missing, please generate them using tools such as Samtools or GATK before running MuSE.
For the reference genome index file, MuSE 1 required it to quickly retrieve the reference allele at a specified genomic position. In contrast, MuSE 2 leverages an in-memory dictionary data structure that stores the entire reference genome, enabling it to access the reference allele directly without needing an index file. As a result, the reference genome index file is no longer required for MuSE 2.

famosab · 2024-11-29T13:13:09Z

Thanks for the detailed answer :)
Maybe you can extend the docs to add the info of the indices!

famosab mentioned this issue Nov 18, 2024

Add module muse/call nf-core/modules#5630

Merged

17 tasks

famosab closed this as completed Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of indexed files #31

Usage of indexed files #31

famosab commented Nov 18, 2024

jiyunmaths commented Nov 27, 2024

famosab commented Nov 29, 2024

Usage of indexed files #31

Usage of indexed files #31

Comments

famosab commented Nov 18, 2024

jiyunmaths commented Nov 27, 2024

famosab commented Nov 29, 2024