Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change TaxonomyProvider useCache to True in MMseqsUtils #3

Open
piehld opened this issue Sep 19, 2023 · 0 comments
Open

Change TaxonomyProvider useCache to True in MMseqsUtils #3

piehld opened this issue Sep 19, 2023 · 0 comments

Comments

@piehld
Copy link
Collaborator

piehld commented Sep 19, 2023

Or add a reload() and testCache method to determine if should rebuild from scratch (in MMseqsUtils.py).

Currently, in etl.build_exdb_resources.UpdateTargetsCofactors, the taxonomy data is downloaded 6 times in a row (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz):

... running   UpdateTargetsCofactors()
2023-09-19 08:57:32,208 - luigi-interface - INFO - [MainThread] - Starting workflow ProteinTargetSequenceExecutionWorkflow (full)
2023-09-19 08:57:33,444 - root - INFO - [MainThread] - Running cacheTaxonomy...
2023-09-19 08:57:35,886 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 08:58:16,205 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
...
...
2023-09-19 09:59:32,570 - root - INFO - [MainThread] - Running searchDatabases...
2023-09-19 09:59:39,341 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 09:59:39,697 - __name__ - INFO - [MainThread] - Starting search result with (8531) records
2023-09-19 09:59:39,733 - __name__ - INFO - [MainThread] - Query match count 742
2023-09-19 09:59:39,733 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'sabdab' (742) bitScore filter (False)
2023-09-19 09:59:40,046 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching sabdab targets (status True) (cutoff=0.95) at 2023 09 19 09:59:40 (7.4758 seconds)
2023-09-19 10:00:05,435 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 10:00:09,308 - __name__ - INFO - [MainThread] - useTaxonomy flag (True)
2023-09-19 10:00:13,438 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 10:00:36,127 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
2023-09-19 10:00:36,127 - __name__ - INFO - [MainThread] - Starting search result with (89015) records
2023-09-19 10:00:37,255 - __name__ - INFO - [MainThread] - Query match count 2839
2023-09-19 10:00:37,259 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'card' (2839) bitScore filter (False)
2023-09-19 10:00:42,039 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching card targets (status True) (cutoff=0.95) at 2023 09 19 10:00:42 (61.9924 seconds)
2023-09-19 10:01:24,274 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 10:01:28,952 - __name__ - INFO - [MainThread] - useTaxonomy flag (True)
2023-09-19 10:01:32,687 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 10:02:09,622 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
2023-09-19 10:02:09,622 - __name__ - INFO - [MainThread] - Starting search result with (106156) records
2023-09-19 10:02:11,032 - __name__ - INFO - [MainThread] - Query match count 4180
2023-09-19 10:02:11,036 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'drugbank' (4180) bitScore filter (False)
2023-09-19 10:02:16,465 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching drugbank targets (status True) (cutoff=0.95) at 2023 09 19 10:02:16 (94.4264 seconds)
2023-09-19 10:03:54,283 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 10:04:02,151 - __name__ - INFO - [MainThread] - useTaxonomy flag (True)
2023-09-19 10:04:05,087 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 10:04:45,656 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
2023-09-19 10:04:45,657 - __name__ - INFO - [MainThread] - Starting search result with (173617) records
2023-09-19 10:04:48,781 - __name__ - INFO - [MainThread] - Query match count 6684
2023-09-19 10:04:48,788 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'chembl' (6684) bitScore filter (False)
2023-09-19 10:04:57,883 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching chembl targets (status True) (cutoff=0.95) at 2023 09 19 10:04:57 (161.4171 seconds)
2023-09-19 10:06:25,748 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 10:06:31,521 - __name__ - INFO - [MainThread] - useTaxonomy flag (True)
2023-09-19 10:06:34,666 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 10:07:12,424 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
2023-09-19 10:07:12,426 - __name__ - INFO - [MainThread] - Starting search result with (131516) records
2023-09-19 10:07:14,207 - __name__ - INFO - [MainThread] - Query match count 8840
2023-09-19 10:07:14,214 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'pharos' (8840) bitScore filter (False)
2023-09-19 10:07:20,676 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching pharos targets (status True) (cutoff=0.95) at 2023 09 19 10:07:20 (142.7927 seconds)
2023-09-19 10:07:45,967 - __name__ - INFO - [MainThread] - convertalis status is True
2023-09-19 10:07:50,188 - __name__ - INFO - [MainThread] - useTaxonomy flag (True)
2023-09-19 10:07:53,175 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy primary fetch status (True) using 'https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
2023-09-19 10:08:28,294 - rcsb.utils.taxonomy.TaxonomyProvider - INFO - [MainThread] - Taxonomy lengths name 2528571 node 2528676 merge 73969
2023-09-19 10:08:28,295 - __name__ - INFO - [MainThread] - Starting search result with (89015) records
2023-09-19 10:08:29,210 - __name__ - INFO - [MainThread] - Query match count 2582
2023-09-19 10:08:29,250 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Query sequences with matches 'card' (2582) bitScore filter (True)
2023-09-19 10:08:32,958 - rcsb.workflow.targets.ProteinTargetSequenceWorkflow - INFO - [MainThread] - Completed searching card targets (status True) (cutoff=0.95) at 2023 09 19 10:08:32 (72.2820 seconds)
2023-09-19 10:08:32,959 - root - INFO - [MainThread] - Running buildFeatures...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant