Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastaDB::compact - Error could not locate file .../round-1/sampleDB-1.fa #49

Closed
reubwn opened this issue Dec 9, 2019 · 5 comments
Closed

Comments

@reubwn
Copy link

reubwn commented Dec 9, 2019

Hello,

I'm getting the following error right at the beginning of RepeatModeler v2.0:

FastaDB::compact - Error could not locate file /full/path/to/RM_17736.MonDec91431122019/round-1/sampleDB-1.fa!
 at /rds/general/user/rnowell/home/anaconda3/envs/repeatmasker/bin/RepeatModeler line 829.

The file in question is present, but empty, which is probably causing the error. But what is causing it to be empty? The (SPAdes) assembly I am trying to analyse is quite fragmented - could that be the issue here? The full output to stdout is given below, and thanks in advance for any advice.

RepeatModeler Version 2.0
=========================
Search Engine = rmblast
LTR Structural Analysis: Enabled
Random Number Seed: 1575901870
Database = scaffolds.fasta ...
  - Sequences = 22742
  - Bases = 172240959
  - N50 = 18556
  - Contig Histogram:
  Size(bp)                                                        Count
  -----------------------------------------------------------------------
  117436-125816 |                                                   [ 3 ]
  109057-117436 |                                                   [ 1 ]
  100678-109057 |                                                   [ 4 ]
  92299-100678  |                                                   [ 3 ]
  83920-92299   |                                                   [ 16 ]
  75540-83919   |                                                   [ 20 ]
  67161-75540   |                                                   [ 33 ]
  58782-67161   |                                                   [ 57 ]
  50403-58782   |                                                   [ 105 ]
  42024-50403   |                                                   [ 215 ]
  33644-42023   |*                                                  [ 381 ]
  25265-33644   |**                                                 [ 746 ]
  16886-25265   |****                                               [ 1513 ]
  8507-16886    |*********                                          [ 3185 ]
  128-8507      |************************************************** [ 16460 ]

Using output directory = /full/path/to/RM_17736.MonDec91431122019
Storage Throughput = fair ( 347.71 MB/s )

Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
      and the repetitive content of the sequences.  It is not imperative
      that RepeatModeler completes all rounds in order to obtain useful
      results.  At the completion of each round, the files ( consensi.fa, and
      families.stk ) found in:
      /full/path/to/RM_17736.MonDec91431122019/ 
      will contain all results produced thus far. These files may be 
      manually copied and run through RepeatClassifier should the program
      be terminated early.


RepeatModeler Round # 1
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 40000000 bp
FastaDB::compact - Error could not locate file /full/path/to/RM_17736.MonDec91431122019/round-1/sampleDB-1.fa!
 at ~/anaconda3/envs/repeatmasker/bin/RepeatModeler line 829.

@jebrosen
Copy link
Member

jebrosen commented Dec 9, 2019

at ~/anaconda3/envs/repeatmasker/bin/RepeatModeler line 829.

What programs do you have from bioconda and not from bioconda? There are several issues with the bioconda packages, some of which are in the process being fixed (for example bioconda/bioconda-recipes#9988). If you can replicate this without bioconda that would be ideal. Since this error occurred before the LTR pipeline, you shouldn't need to install all of those dependencies.

@reubwn
Copy link
Author

reubwn commented Dec 9, 2019

Hmm good question. I have installed the latest RepeatModeler-2.0 and RepeatMasker-4.1.0 packages manually but then linked them into the bin/ dir of an older conda environment that I had set up previously. So both RepeatModeler and RepeatMasker are not installed directly from bioconda, but some of the perl modules and other dependencies might be. I will try and replicate it outside of the conda environment and get back to you.

@reubwn
Copy link
Author

reubwn commented Dec 10, 2019

Update: this particular error does not occur if I install all the dependencies outside of conda, so it looks like you were right that it was an issue with the conda-based install. I'm happy to share anything else that might expedite a solution, since that way is so useful (when it works). Thanks!

@jebrosen
Copy link
Member

I took a look at the code around the error point - it looks pretty unlikely to succeed before that point and then fail. So it's possible this was a fluke, unless you ran into it multiple times and got the same error.

Some contributors are currently working on fixing and testing the bioconda package(s), so the best thing to do on that front is to wait a bit and try again.

@reubwn
Copy link
Author

reubwn commented Dec 10, 2019

Yes - I think the actual error occurred upstream, as the file ../round-1/sampleDB-1.fa had been written but was empty (no sequence data). Something to do with sampling the input fasta or database files? Outside of conda, the program takes some time before printing the histogram, whereas the failed run progresses rapidly to the error message. Anyway, thanks again for your time and help.

@reubwn reubwn closed this as completed Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants