Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking existence of specific genome in getCollection function #53

Closed
TakahiroYamada opened this issue Jun 1, 2020 · 2 comments
Closed

Comments

@TakahiroYamada
Copy link

Hi Hello. I'm Takahiro Yamada and greatly thanks to you for very convenient R package, biomartr.
I tried to get a genome from NCBI GenBank database and faced following error.

> biomartr::getCollection(db = "genbank" , "GCA_900240375.1")
Starting collection retrieval (genome, proteome, cds, gff/gtf, rna, repeat masker, assembly stats) for GCA_900240375.1 ...
|===============================================================================| 100%   54 MB
Unfortunatey, no entry for 'GCA_900240375.1' was found in the 'refseq' database. Please consider specifying 'db = genbank' or 'db = ensembl' or 'db = ensemblgenomes' or 'db = uniprot' to check whether 'GCA_900240375.1' is available in these databases.
 error: No entry was found for organism GCA_900240375.1. Could the name be misspelled?

The species of this genome is registered in NCBI GenBank, and GenBank assembly accession is registered but RefSeq assembly accession is not (please see following URL).
https://www.ncbi.nlm.nih.gov/assembly/GCA_900240375.1

I suppose my error was generated because of the following code in getCollection.R,

getCollection <-
        function(db = "refseq",
                 organism,
                 reference = TRUE,
                 release = NULL,
                 gunzip = FALSE,
                 remove_annotation_outliers = FALSE,
                 path = file.path("_db_downloads","collections")
        ) {
        
        new_name <- stringr::str_replace_all(organism," ","_")
        message("Starting collection retrieval (genome, proteome, cds, gff/gtf, rna, repeat masker, assembly stats) for ", new_name, " ...")
            
        org_exists <- is.genome.available(db = "refseq", organism, details = TRUE) 
...

The above code org_exists <- is.genome.available(db = "refseq", organism, details = TRUE) may be org_exists <- is.genome.available(db = db, organism, details = TRUE) if genome should be checked based on user specified database?

I also confirmed is.genome.available(db = "genbank" , "GCA_900240375.1") generates TRUE.

Thank you!

Takahiro

HajkD added a commit that referenced this issue Jun 2, 2020
… RefSeq and not in other databases due to a constant used in is.genome.available() rather than a variable (Many thanks to Takahiro Yamada for cathing the bug) #53
@HajkD
Copy link
Member

HajkD commented Jun 2, 2020

Hi Takahiro,

Thank you so much for contacting me and absolutely brilliant that you caught this bug!
Thank you very much!

I already fixed this bug and the function should work accordingly now.

Thank you!

Hajk

@HajkD HajkD closed this as completed Jun 4, 2020
@TakahiroYamada
Copy link
Author

Great!
Thank you very much for your prompt fixing it!

Takahiro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants