Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTR_retriever failed to generate a file #230

Open
cyycyj opened this issue Dec 8, 2023 · 4 comments
Open

LTR_retriever failed to generate a file #230

cyycyj opened this issue Dec 8, 2023 · 4 comments
Labels

Comments

@cyycyj
Copy link

cyycyj commented Dec 8, 2023

Dear Robert, Jeb and Francisco,

Describe the issue

I am working on a plant genome (~500Mb, het=1.29%). As the title mentioned, when I run RepeatModeler, LTR_retriever failed to generate a file. I am not sure if it is the same issue as you pinned #202.

Reproduction steps

Please kindly find the files as below, and I have to convert them into txt files to meet the attachment rule of github. Let me clarify them:

01.repeatmodeler.sh.txt: the slurm script I submitted.
01.repeatmodeler1.out.txt & 01.repeatmodeler1.err.txt: std and err output of slurm.
LTR_retriever.log: LTR_retriever log mentioned in 01.repeatmodeler1.err.txt
LTR.identifier.pl.txt: the error script mentioned in LTR_retriever.log

Log output

01.repeatmodeler.sh.txt
01.repeatmodeler1.out.txt
01.repeatmodeler1.err.txt
LTR_retriever.log
LTR.identifier.pl.txt

Environment (please include as much of the following information as you can find out):

  • How did you install RepeatModeler? e.g. manual installation from repeatmasker.org, bioconda, the Dfam TE Tools container, or as part of another bioinformatics tool?

manual installation from repeatmasker.org

  • Which version of RepeatModeler do you have? The output of RepeatModeler without any options will be a help page with the version of the program displayed at the top.

RepeatModeler-2.0.5

  • Which version of RepeatMasker is this RepeatModeler installation using? Have you installed RepBase RepeatMasker Edition for RepeatMasker, or the full Dfam database?

RepeatMasker-4.1.6, dfam38_full.0.h5+dfam38_full.5.h5.gz+RepBaseRepeatMaskerEdition-20181026.tar.gz. You could find detail information on Dfam-consortium/RepeatMasker#238

  • Operating system and version. The output of uname -a and lsb_release -a can be used to find this.

Linux cln01 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@cyycyj cyycyj added the bug label Dec 8, 2023
@rmhubley
Copy link
Member

rmhubley commented Dec 8, 2023

Thanks for this wonderfully detailed bug report. You identified the cause of the error:

Invalid value for shared scalar at /data/miniconda3/envs/repeat/share/LTR_retriever/bin/LTR.identifier.pl line 114, line 10083.

Which is a problem with the LTR_retriever program -- related to data passing in a multithreaded run. Please additionally report it here: https://github.com/oushujun/LTR_retriever so the authors can be aware of the issue. I would highly recommend avoiding conda when using RepeatModeler/RepeatMasker (and the dependencies) as we have had nothing but problems with bad recipes, and mismatched dependencies. It is a bit strange that in your RepeatModeler log output (*.out.txt), LTR_retriever is the only dependency for which RepeatModeler couldn't ascertain it's version.

Search Engine = rmblast 2.14.1+
Threads = 128
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever ,
                                   Ninja 0.97-cluster_only, MAFFT 7.520,
                                   CD-HIT 4.8.1 )

Love the 128 threads by the way.....heavy metal!

Another thing you can try is to run LTR_retriever on its own to see if you can reproduce this without having to go through the trouble of running inside of RepeatModeler. Simply run:

% /data/miniconda3/envs/repeat/share/LTR_retriever/bin/LTR_retriever -repeatmasker /data/biosoft/RepeatMasker -blastplus /data/miniconda3/envs/repeat/bin -cdhit_path /data/miniconda3/envs/repeat/bin -trf_path /data/miniconda3/envs/repeat/bin/trf -genome seq.fa -inharvest /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_202283.ThuDec71609352023/LTR_217736.FriDec80530182023/raw-struct-results.txt -noanno -threads 128

This is what you would need to do to report this to the LTR_retriever group, and maybe even provide them with that raw-struct-results.txt file.

For this RepeatModeler run, you should know that the bulk of the results are still intact even if this step fails. It looks like
RepeatModeler found 2,117 families (LTRs included). Depending on your use-case, you should be able to use this as a starting point for library curation, genome masking etc and rerun just the LTR structural finding and merge results at a later time. To run just that portion of the analysis you would do:

%  <RepeatModeler Directory>/LTRPipeline -threads 128  P-2.primary.fa

@cyycyj
Copy link
Author

cyycyj commented Dec 9, 2023

Dear Robert,

Thank you for your detailed and quick reply! I found that this issue may have come with the error in the conda distribution of LTR_retriever, as others have also encountered a similar issue before (oushujun/LTR_retriever#159). I have reinstalled it manually, and this time RepeatMasker seems to find LTR_retriever's version correctly.

RepeatModeler Version 2.0.5
===========================
Using output directory = /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_114078.SatDec91134272023
Search Engine = rmblast 2.14.1+
Threads = 128
Dependencies: TRF 4.09, RECON , RepeatScout 1.0.6, RepeatMasker 4.1.6
LTR Structural Analysis: Enabled ( GenomeTools 1.6.5, LTR_Retriever v2.9.5,
                                   Ninja 0.97-cluster_only, MAFFT 7.520,
                                   CD-HIT 4.8.1 )

I am also updating this issue on LTR_retriever (oushujun/LTR_retriever#160), and I am working on reproducing this as you mentioned above. Once there is something new, I will update you.

What's more, for HPC users like me, we do not have root access for installing software/packages, so using conda to build dependencies might be a good choice.

By the way, I also think 128 threads is heavy metal, but I love Britpop like Coldplay, lol

@cyycyj
Copy link
Author

cyycyj commented Dec 9, 2023

Oops! Somthing new happened, and the slurm work terminated in round5.

...
FATAL ERROR: RepeatModeler giving up. One or more
batches failed!  Unfortunately this type of error
cannot be recovered from. Please submit the following
details to the feedback page at the repeatmasker
website:

       http://www.repeatmasker.org

RepeatModeler Version: 2.0.5
Search Engine: rmblast [ 2.14.1+ ]
Command Line: /data/biosoft/RepeatModeler-2.0.5/RepeatModeler-database /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/00.BuildDatabase/P-2.primary -threads 128 -LTRStruct
Batch Number: 2981
Disk Space:
Filesystem          1K-blocks           Used     Available Use% Mounted on
/wfbdnxy       16447657790160 10574408071604 5873249718556  65% /data

System Memory:
Further details about this problem may be found in
the directory: /data/genome_assembly/genome/P-2/10.repeat/primary/01.repeat/01.RepeatModeler/RM_114078.SatDec91134272023

I am thinking about it may be memory issue. Would you mind give me a email adress so that I can share you the original round5 folder?

@simone-says
Copy link

I'm having an issue with the clustering step of the LTRPipeline, but I re-ran an older version not in TE Tools container but in a RepeatModeler Singularity container and it worked. How can I merge the RepeatModeler and LTRPipeline results before running RepeatMasker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants