-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmseqs createindex --split not generating correct number of splits #432
Comments
The seg-fault errors that I'm getting with
Note that sometimes when I re-run the command, I instead get the error:
System memory should not be the cause; I've got ~800 Gb free. Maybe I'm missing a "hidden" input file (ie., one of the files associated with the main input files, which are generally no mentioned in any of the docs). The input files that are present:
If I had to guess, there's probably something wrong with the |
I introduced the two additional splits because of #338. Though that wasn't very effective to reduce peak memory use. The error looks like a memory corruption though. I am not really sure how to reproduce the issue locally.
The only change was to remove the |
The next step would be to try a MMseqs2 build instrumented with ASan. Sadly ASan doesn't support static builds so you would have to compile MMseqs2 yourself:
The new binary in
|
Removing the |
Here's the output from the ASan run:
|
Thanks, I suspected that this might have been the problem. I'll update you once we figure out how to fix this. |
Ah sorry, that makes a lot of sense that this doesn't work. Iterative-profile searches won't work currently together with the taxonomy workflow, since the alignment positions computed in the taxonomy workflow don't refer to the same things that the iterative-profile-search workflow expects. I am not this type of search makes sense. Could you explain your use case for combining these two? I am not sure if it's fixable with the current protocol, we might just disallow taxonomy in combination with iterative-profile searches instead. |
Thanks for looking more into the issue. I carried over the iterative search parameters from some other |
Expected Behavior
I expect
--split 16
formmseqs createindex
to generate 16*.idx
files. Instead, I'm getting 18:Pipeline software (eg., snakemake) generally requires keeping track of all (important) output files produced; otherwise, untracked output files can accidentally be deleted, which is is causing some downstream problems (eg., seg-fault errors for
mmseqs taxonomy
).Steps to Reproduce (for bugs)
Your Environment
OS:
Ubuntu 18.04.5
The text was updated successfully, but these errors were encountered: