Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in downloading bacterial group #34

Closed
mehmetdirenc opened this issue Nov 5, 2018 · 7 comments
Closed

Error in downloading bacterial group #34

mehmetdirenc opened this issue Nov 5, 2018 · 7 comments

Comments

@mehmetdirenc
Copy link

Hi,

I just started using this package to download proteobacteria and its subgroups, i got an error when it was downloading yersinia nurmii from refseq as in below.

Starting genome retrieval of 'Yersinia nurmii' from refseq ...

|=============================================================================================================================| 100% 38 MB
Error: The FTP site 'ftp://ftp.ncbi.nlm.nih.gov/' cannot be reached. Are you connected to the internet? Is the the FTP site 'NA/NA_genomic.fna.gz' currently available?
In addition: Warning message:
It seems like there are some files in download folder that are neither pre-downloaded species files nor doc_ or md5checksum files.

Also how did you classify the output of listgroups() function if i may ask? Is it up to date or from NCBI database?

Great package btw :)

Best,
Mehmet

@HajkD
Copy link
Member

HajkD commented Nov 6, 2018

Hi Mehmet,

Many thanks for contacting me. I am happy if you find biomartr useful.

Regarding your question. The output of biomartr::listGroups(db = "refseq", kingdom = "bacteria")
downloads all available entries (to date) from NCBI RefSeq. By specifying the db argument you can also retrieve available entries from NCBI Genbank, ENSEMBL, and ENSEMBLGENOMES if this helps. You can find further details here: https://ropensci.github.io/biomartr/articles/Sequence_Retrieval.html#analogous-computations-can-be-performed-for-groups-and-subgroups

Regarding your genome download of Yersinia nurmii.

When running the function:

biomartr::is.genome.available(db = "refseq", organism = "Yersinia nurmii", details = TRUE)
Only a non-reference genome assembly is available for 'Yersinia nurmii'. Please make sure to specify the argument 'reference = FALSE' when running any get*() function.
# A tibble: 1 x 21
  assembly_access… bioproject biosample wgs_master refseq_category  taxid species_taxid
  <chr>            <chr>      <chr>     <chr>      <chr>            <int>         <int>
1 GCF_001112925.1  PRJNA2241… SAMEA148… CPYD00000… na              685706        685706
# ... with 14 more variables: organism_name <chr>, infraspecific_name <chr>,
#   isolate <chr>, version_status <chr>, assembly_level <chr>, release_type <chr>,
#   genome_rep <chr>, seq_rel_date <date>, asm_name <chr>, submitter <chr>,
#   gbrs_paired_asm <chr>, paired_asm_comp <chr>, ftp_path <chr>,
#   excluded_from_refseq <chr>

You can see that there is a non-reference genome available at NCBI RefSeq which should be downloadable. I will have a look at why the genome download doesn't work properly and will come back to you soon.

Many thanks for making me aware of this.

Best wishes,
Hajk

@ashoks773
Copy link

Hi Hajk,

I am trying to download the bacterial genomes using 'biomartr'
biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")

And I am getting the following error:
Starting genome retrieval of 'Campylobacter hyointestinalis' from refseq ...

The download session seems to have timed out at the FTP site 'NA/NA_genomic.fna.gz'. This could be due to an overload of queries to the databases. Please restart this function to continue the data retrieval process or wait for a while before restarting this function in case your IP address was logged due to an query overload on the server side.
Error: Please provide a valid file path to your genome assembly file.

It seems assembly file is not available for this genome. How to ignore this step to download other genomes.

@HajkD
Copy link
Member

HajkD commented Mar 17, 2021

Hi Ashok,

Many thanks for contacting me.

Have you tried setting reference = TRUE?

biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome", reference = TRUE)

It seems like NCBI doesn't provide FTP path information for some strain versions without reference genome. This is a limitation on the NCBI side. But I am happy to check whether I can build a check into the retrieval function to omit this case.

Does this help?

Cheers,
Hajk

HajkD added a commit that referenced this issue Mar 17, 2021
…ir `species summary files` for species without reference genomes. As a result `meta.retrieval()` stopped working, because no FTP paths were found for some species. This issue was now fixed by adding the filter rule `!is.na(ftp_path)` into all `get*()` functions (Many thanks for making me aware of this issue Ashok Kumar Sharma #34 and Dominik Merges #72)
@HajkD
Copy link
Member

HajkD commented Mar 17, 2021

This issue is fixed now by adding the filter rule !is.na(ftp_path) into all get*() functions.

Hence, you should be able to run your initial code with reference = FALSE (default) without problems
after installing the developer version of biomartr.

biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")

Let me know if this works for you.

Cheers,
Hajk

@ashoks773
Copy link

Hi Hajk,

Thank you so much for a quick response. I have started downloading these genomes again. Will get back to you if I will get any error. Thank you.

Best,
Ashok

@HajkD
Copy link
Member

HajkD commented Mar 19, 2021

Hi Ashok,

I now also added the !is.na(ftp_path) filter rule when working with reference = TRUE (see #72).

Please let me know if it works now for you and then I can close this issue.

Cheers,
Hajk

@HajkD
Copy link
Member

HajkD commented Sep 27, 2023

I assume this has been solved.

@HajkD HajkD closed this as completed Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants