-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save RepeatMasker results #54
Comments
Hi @CeciliaDeng Thank you for raising this issue. Nextflow has a native method for saving outputs. For example, here is how the TE lib from EDTA is saved: We won't have to parse .nextflow ourselves. Step 3 however needs some additional work. The script needs to be converted to a Nextflow module and tested using a small test file. For example, see how the |
Awesome that this issue will be taken care in v0.4.0. Thank you @GallVp |
I am going to use the official converter as it is more likely to receive an update when the RepeatMasker is updated. |
I didn't use rmOutToGFF3.pl because the information in column 9 is poor (e.g. "Target=FAM 24 180"). Alternatively, perhaps maybe the ".out" file can be saved as well? Then it would be up to the users to convert ".out" to gff3 by themselves with the information they like to add in column 9. |
Yes, the outputs are saved if the |
@GallVp in this case (generating gff3) you might only need to enable the option "-gff" of RepeatMasker, then you won't need to test the rmOutToGFF3.pl in the pipeline. |
Thank you @ting-hsuan-chen This is actually a better option. |
The only issue with '-gff' is that the file will fail gff3 validation when loading them into fairGenomes. The output.gff3 from @ting-hsuan-chen's code can be directly added as TE.gff3 and loaded as a TE track in JB2. |
Forgot I wrote this ages ago - its old and may or may not work but it was used for converting Repeatmasker 1 .out files to gff3 for loading in WebApollo (JBrowse 1.6) /output/hrarnc/software/bin/convert_repeatmasker_out_2_gff3.pl USAGE: /output/hrarnc/software/bin/convert_repeatmasker_out_2_gff3.pl -r=repeatmasker.out [-o=output.gff3 -source=RepeatMasker -type=dispersed_repeat] Suffers from allowing just one type (value in column 3 in .out to .gff3 conversion) Not written to add "##sequence-region ..." as I have another script that does it @GallVp - yeah its not published, not supported etc like any other of my code that you might have used in your systems at some point |
Thanks @CeciliaDeng and @rosscrowhurst I'll investigate these issues and report back my findings here. I am removing the |
Here are my findings. The test file is attached. The test scripts are also attached.
I want to use number 4 but it has no tests or test data. @rosscrowhurst can you kindly create a repo and publish it through PFR org? I think we should have a As the pipeline will be published, I cannot include untested custom scripts. |
Test data and scripts for the above comment. |
Got permission from @rosscrowhurst to include his script. I'll add unit tests. |
Hi @GallVp and @jasonshiller, The output of RepeatMasker can be used for visualization and other applications. Can we please keep the results in final/ if genepao/pangene runs this step? Thank you.
My current approach is:
Get info from .nextflow.log
grep REPEATMASK .nextflow.log | grep COMPLETED > tmp.list
Find the RepearMasker folder from the tmp.list, and copy (or move) the files to final/. For example,
cp -p work/8d/eb6f63fca54c6eaaa28923947e4577/Rhap1/* results/final/Rhap1/RepeatMask/
Convert the .out file to gff3 using the script from @ting-hsuan-chen:
cd results/final/Rhap1/RepeatMask; /workspace/cflthc/script/KRIP_TE/09_benchmarking/RMout2gff3.sh Rhap1.fa.out
The text was updated successfully, but these errors were encountered: