-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMSeqs2 DB slimmer #316
Comments
A short update: I started working on this, however found some potential weirdness in result2msa that I want to look first before pushing the changes. Should be done in the next few days. |
Thank you very much @milot-mirdita It will be very useful to keep our DBs slim :-) |
I added a |
Awesome! @ChiaraVanni will test it and will we back to you in case we find any problem. Many thanks! |
We had a bug that martin fixed yesterday in 18588bb. |
Hi @milot-mirdita
we got few alignment DBs, the index and dbtype files. Looking at the alignment DB files, it seems that they have the cluster DB format, and the number of entries has decreased substantially. Any suggestions on converting the output of Many thanks! |
You can use
|
Thanks @milot-mirdita This works perfect. Many thanks! |
Yes if you just pass the clustering db to Usually I use something like:
Here we use |
Worked beautifully! Thanks @milot-mirdita |
Following this issue, i was confused when i want to select only 10 most divergent seqs in each cluster by the following command: |
filterresult should behave equivalent to HH-suite's It should keep all entries, assuming the other filtering parameters are also fulfilled, by default |
Hi
this is not an issue but a potential enhancement we discussed with @martin-steinegger.
We have a seed clustering database that is continuously updated with new sequences. The size of the DB is growing quite fast, and eventually, we will have problems storing and distributing it. As we have many redundant sequences in each cluster. We thought that having a module that takes a DB and then filters it based on a criterion similar to
--diff
fromresult2msa
orresult2profile
would be very useful to keep only informative sequences in the clusters.Thanks
Antonio
The text was updated successfully, but these errors were encountered: