Minimap2 --split-prefix option #887

YasirKusay · 2022-03-13T04:02:04Z

Hi, I would like to know more about the --split-prefix option.

I have a very large (1200 GB) index that I want to align and I of course don't want to load the entire index into memory. I wanted to test out minimap2 --split-prefix on an assembly of 17000 contigs using relatively small resources (48 GB of ram, 16 threads and 150 GB of disk space) just to see how it worked and I expected that the command will fail (as it would run out of disk space to store the index partitions) but the command actually executed to completion (taking 134 minutes). I am confused, did alignment happen against the entire index, or did something else happen?

If this helps, I did inspect the index partitions during execution and they were about 20 MB each.

YasirKusay · 2022-03-14T12:36:57Z

Hi, This is just an extension to the above. I would like to be able to run minimap2 incrementally (e.g. load 4 GB of the index into RAM, align, move on to the next 4 GB of the index, etc). I did try and run the program on default settings with 5GB and 10GB of RAM just to see what would happen, but there was an error: 24781 Killed and 17044 Killed respectively.

I don't know why minimap2 failed, but I thought that it loaded 4GB of the index per alignment, before moving onto the next index batch.

hasindu2008 · 2022-03-14T13:13:59Z

Here is some information from the first implementation of the --split-prefix:

It loads the index part by part, while iteratively mapping the queries to each index partition. The intermediate results will be saved as temporary files. Finally, it will go through all the temporary files and merge the results. Detailed methodology is available at https://www.nature.com/articles/s41598-019-40739-8 and technical information in https://static-content.springer.com/esm/art%3A10.1038%2Fs41598-019-40739-8/MediaObjects/41598_2019_40739_MOESM1_ESM.pdf.

Here are some example commands we originally used for testing on the human genome:
https://github.com/hasindu2008/minimap2-arm/tree/master/misc/idxtools

Is the 1200GB index a fasta file or a minimap2 index?
When you say 4GB of the index - are you referring to -I 4G option? As far as I am aware it means 4 Gigabases of the reference. This can become like 10-12GB depending on the reference characteristics. And minimap by default loads a batch of queries (1G as I remember), and for those and intermediate data structures need extra RAM.
The disk usage for temporary will depend on the size of your query reads rather than the reference. What is the size of your fastq?

YasirKusay · 2022-03-14T14:20:51Z

Hi @hasindu2008, Thank you for your reply.

I was actually confused by --split-prefix option initially, as I assumed that the temporary files were partitioned indexes rather than the results, so thank you for clearing that up.

The 1200GB index is the actual minimap2 index.
I am not referring to the -I 4G index as I already have the index. Does it still load 4 gigabases of the index while doing the alignment with the default settings?

My primary concern is to be able to use the entire 1200 GB index, with as little RAM as possible. Based on what you have told me, I think that I can achieve this using the --split-prefix option. Does that mean the default settings for alignments will load the entire index at once? If not, what is the difference between the default settings and --split-prefix option?

hasindu2008 · 2022-03-15T01:35:43Z

If you created minimap2 index with default options, it will create load create an index with multiple parts each with 4Gbases. So yes, it will load only 4 gigabases at a time with default options.

The default settings will not load the whole index, it will still iteratively map part by part and output all mappings to each part. In summary, if --split-prefix is used, the mappings will be more accurate than without it - https://www.nature.com/articles/s41598-019-40739-8 contains all the information.

YasirKusay · 2022-03-15T01:53:25Z

Thanks for your reply!

I will now use minimap2 with --split-prefix as the option (I did notice that both --split-prefix and the default settings actually took similar times).

On a side note, has this tool ever been benchmarked using the NCBI NT index?

YasirKusay closed this as completed Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimap2 --split-prefix option #887

Minimap2 --split-prefix option #887

YasirKusay commented Mar 13, 2022

YasirKusay commented Mar 14, 2022 •

edited

Loading

hasindu2008 commented Mar 14, 2022

YasirKusay commented Mar 14, 2022

hasindu2008 commented Mar 15, 2022

YasirKusay commented Mar 15, 2022 •

edited

Loading

Minimap2 --split-prefix option #887

Minimap2 --split-prefix option #887

Comments

YasirKusay commented Mar 13, 2022

YasirKusay commented Mar 14, 2022 • edited Loading

hasindu2008 commented Mar 14, 2022

YasirKusay commented Mar 14, 2022

hasindu2008 commented Mar 15, 2022

YasirKusay commented Mar 15, 2022 • edited Loading

YasirKusay commented Mar 14, 2022 •

edited

Loading

YasirKusay commented Mar 15, 2022 •

edited

Loading