-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimap2 --split-prefix option #887
Comments
Hi, This is just an extension to the above. I would like to be able to run minimap2 incrementally (e.g. load 4 GB of the index into RAM, align, move on to the next 4 GB of the index, etc). I did try and run the program on default settings with 5GB and 10GB of RAM just to see what would happen, but there was an error: 24781 Killed and 17044 Killed respectively. I don't know why minimap2 failed, but I thought that it loaded 4GB of the index per alignment, before moving onto the next index batch. |
Here is some information from the first implementation of the It loads the index part by part, while iteratively mapping the queries to each index partition. The intermediate results will be saved as temporary files. Finally, it will go through all the temporary files and merge the results. Detailed methodology is available at https://www.nature.com/articles/s41598-019-40739-8 and technical information in https://static-content.springer.com/esm/art%3A10.1038%2Fs41598-019-40739-8/MediaObjects/41598_2019_40739_MOESM1_ESM.pdf. Here are some example commands we originally used for testing on the human genome:
|
Hi @hasindu2008, Thank you for your reply. I was actually confused by --split-prefix option initially, as I assumed that the temporary files were partitioned indexes rather than the results, so thank you for clearing that up.
My primary concern is to be able to use the entire 1200 GB index, with as little RAM as possible. Based on what you have told me, I think that I can achieve this using the --split-prefix option. Does that mean the default settings for alignments will load the entire index at once? If not, what is the difference between the default settings and --split-prefix option? |
If you created minimap2 index with default options, it will create load create an index with multiple parts each with 4Gbases. So yes, it will load only 4 gigabases at a time with default options. The default settings will not load the whole index, it will still iteratively map part by part and output all mappings to each part. In summary, if --split-prefix is used, the mappings will be more accurate than without it - https://www.nature.com/articles/s41598-019-40739-8 contains all the information. |
Thanks for your reply! I will now use minimap2 with --split-prefix as the option (I did notice that both --split-prefix and the default settings actually took similar times). On a side note, has this tool ever been benchmarked using the NCBI NT index? |
Hi, I would like to know more about the --split-prefix option.
I have a very large (1200 GB) index that I want to align and I of course don't want to load the entire index into memory. I wanted to test out minimap2 --split-prefix on an assembly of 17000 contigs using relatively small resources (48 GB of ram, 16 threads and 150 GB of disk space) just to see how it worked and I expected that the command will fail (as it would run out of disk space to store the index partitions) but the command actually executed to completion (taking 134 minutes). I am confused, did alignment happen against the entire index, or did something else happen?
If this helps, I did inspect the index partitions during execution and they were about 20 MB each.
The text was updated successfully, but these errors were encountered: