Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDD database #410

Closed
fmaumusINRA opened this issue Feb 9, 2021 · 5 comments
Closed

CDD database #410

fmaumusINRA opened this issue Feb 9, 2021 · 5 comments

Comments

@fmaumusINRA
Copy link

Hello,

Doing some protein work, among the databases proposed for download, would it be possible to add a profile library corresponding to the CDD database? I need to search for protein domains.

Thanks,
Florian

@milot-mirdita
Copy link
Member

milot-mirdita commented Feb 9, 2021

This should work:

wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/fasta.tar.gz
mmseqs tar2db fasta.tar.gz cddmsa
sed 's|\.FASTA||g' cddmsa.lookup > cddmsa.lookup_tmp
mv -f cddmsa.lookup_tmp cddmsa.lookup
mmseqs apply cddmsa cddmsa_wo_cons -- awk '/^>/ { i++ } i > 1 { print; }'
awk 'BEGIN { printf("%c%c%c%c",11,0,0,0); exit; }' > cddmsa_wo_cons.dbtype
ln -s cddmsa.lookup cddmsa_wo_cons.lookup
mmseqs msa2profile cddmsa_wo_cons cdd --match-mode 1

mmseqs easy-search QUERY.fasta cdd res.m8 tmp

Could you try out if this performs approximately how you expect it to? If this works well I can add it to the databases workflow.

(I fixed a few things shortly after I posted this.)

@fmaumusINRA
Copy link
Author

It works like a charm, thanks a lot.
I think it is a very useful database and, I don't know why, I can find pfam domains that were missed when using the pfam database (both seed & full).

@milot-mirdita
Copy link
Member

Could you share one query that finds a PFAM match in CDD and not in PFAM(-full/seed)?

The PFAM databases are essentially built with the same procedure as described above.

@fmaumusINRA
Copy link
Author

Sorry I think that was was the case when using pfam as querydb, experimenting...
Anyway, thanks a lot for providing CDD database!!

@milot-mirdita
Copy link
Member

I added the CDD to the databases downloader. Thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants