Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflate operation failed: invalid distance too far back terminate called after throwing an instance of 'std::runtime_error' #64

Open
CarlosAmadeo7 opened this issue Oct 29, 2024 · 6 comments

Comments

@CarlosAmadeo7
Copy link

Hello rpvg team:
I've successfully gotten the .gamp file of the transcriptome file with vg mpmap and there was no problem at all. But when I run rpvg, I have this error:

Running rpvg (commit: cd5160d)
Random number generator seed: 1730236892
Fragment length distribution parameters found in alignment (mean: 151.096, standard deviation: 43.1828)
Loaded graph, GBWT and r-index (6.607 seconds, 10.2174 GB)
[E::bgzf_uncompress] Inflate operation failed: invalid distance too far back
terminate called after throwing an instance of 'std::runtime_error'
what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 47907863952 group 41477367607
/cm/local/apps/slurm/var/spool/job17498325/slurm_script: line 25: 39401 Aborted

What is weird is that I ran rpvg previously with two different gamp files, and they ran okay, but this one is not working properly.

The command I am using is this one:

singularity exec -B /work /work/public/singularity/rpvg_latest.sif rpvg -t 32 -g $xg_path -p $gwbt_path -f $txt_gz_path -a mpmap_03_control.gamp -o rpvg --inference-model transcripts

where xg_path, gwbt_path, and txt_gz_path are where the files are located.
I used the same command to run rpvg before but using different gamp files and it was ok.
I would appreciate any help
Best

@jeizenga
Copy link
Collaborator

It's possibly a truncated file. Can you share how you made the GAMP?

@CarlosAmadeo7
Copy link
Author

Sure
It is the same line of code I used to generate my previous 2 gamp files:

singularity exec -B /work/ /work/alfaroqc/apps/vg_v1.57.0.sif vg mpmap -t 32 -x $xg_path -g $gcsa_path -d $dist_path -f $read_1_3 -f $read_2_3 > mpmap_03_control.gamp

where xg_path, gcsa_path, and dist_path are where the files are located, as well as the paired-end reads: read_1_3 and read_2_3

The output I obtained is this one:

[vg mpmap] elapsed time 0 s: Executing command: /vg/bin/vg mpmap -t 32 -x /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg -g /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa -d /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq
[vg mpmap] elapsed time 0 s: Loading graph from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg
[vg mpmap] elapsed time 4 s: Completed loading graph
[vg mpmap] elapsed time 4 s: Graph is in XG format. XG is a good graph format for most mapping use cases. PackedGraph may be selected if memory usage is too high. See vg convert if you want to change graph formats.
[vg mpmap] elapsed time 4 s: Identifying reference paths
[vg mpmap] elapsed time 5 s: Loading GCSA2 from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa
[vg mpmap] elapsed time 5 s: Loading distance index from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist (in background)
[vg mpmap] elapsed time 8 s: Completed loading distance index
[vg mpmap] elapsed time 9 s: Completed loading GCSA2
[vg mpmap] elapsed time 9 s: Loading LCP from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa.lcp
[vg mpmap] elapsed time 9 s: Memoizing GCSA2 queries (in background)
[vg mpmap] elapsed time 12 s: Completed loading LCP
[vg mpmap] elapsed time 13 s: Completed memoizing GCSA2 queries
[vg mpmap] elapsed time 13 s: Building null model to calibrate mismapping detection
[vg mpmap] elapsed time 15 s: Mapping reads from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq and /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq using 32 threads
[vg mpmap] elapsed time 50.4 m: Mapped 5000000 read pairs
[vg mpmap] elapsed time 1.7 h: Mapped 10000000 read pairs
[vg mpmap] elapsed time 2.5 h: Mapped 15000000 read pairs
[vg mpmap] elapsed time 3.3 h: Mapped 20000000 read pairs
[vg mpmap] elapsed time 4.1 h: Mapped 25000000 read pairs
[vg mpmap] elapsed time 5.0 h: Mapped 30000000 read pairs
[vg mpmap] elapsed time 5.8 h: Mapped 35000000 read pairs
[vg mpmap] elapsed time 6.6 h: Mapped 40000000 read pairs
[vg mpmap] elapsed time 7.4 h: Mapped 45000000 read pairs
[vg mpmap] elapsed time 8.3 h: Mapped 50000000 read pairs
[vg mpmap] elapsed time 9.1 h: Mapped 55000000 read pairs
[vg mpmap] elapsed time 9.9 h: Mapped 60000000 read pairs
[vg mpmap] elapsed time 10.8 h: Mapped 65000000 read pairs
[vg mpmap] elapsed time 11.6 h: Mapped 70000000 read pairs
[vg mpmap] elapsed time 12.4 h: Mapped 75000000 read pairs
[vg mpmap] elapsed time 12.9 h: Mapping finished. Mapped 77863987 read pairs.

The output looks similar to the previous gamp files generated.

@jeizenga
Copy link
Collaborator

Well, if this became truncated, it probably happened after vg mpmap, since it seems to have exited successfully. Would the handling after this have allowed truncation (e.g. downloading from a remote source)? Another possibility is that some extra output got mixed into/tacked onto the output. In any case, I suspect the error originates earlier in the pipeline than rpvg. One quick check would be to run vg filter -M -t <N_THREADS> alns.gamp > /dev/null to see if vg can read it.

@CarlosAmadeo7
Copy link
Author

Hello there!
I tired to verify the integrity of the gamp files and I have this error when I tried to convert it into jason: e.g :vg view -a mpmap_05_treatment.gamp > /dev/null

The error is the following:
/cm/local/apps/slurm/var/spool/job17529652/slurm_script: line 12: cd: /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads/Misha_reads: No such file or directory
terminate called after throwing an instance of 'std::runtime_error'
what(): [io::ProtobufIterator] tag "MGAM" for Protobuf that should be "GAM"
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.57.0 "Franchini"
Stack trace (most recent call last):
#14 Object "/vg/bin/vg", at 0x5f4c5d, in _start
#13 Object "/vg/bin/vg", at 0x1f6ae5f, in __libc_start_main
#12 Object "/vg/bin/vg", at 0x5c497e, in main
#11 Object "/vg/bin/vg", at 0xd73feb, in vg::subcommand::Subcommand::operator()(int, char**) const
#10 Object "/vg/bin/vg", at 0xd831bb, in main_view(int, char**)
#9 Object "/vg/bin/vg", at 0xf4c040, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::istream&)>)
#8 Object "/vg/bin/vg", at 0xd7fdd1, in std::_Function_handler<void (std::istream&), main_view(int, char**)::{lambda(std::istream&)#9}>::_M_invoke(std::_Any_data const&, std::istream&)
#7 Object "/vg/bin/vg", at 0xc2e370, in void vg::io::for_eachvg::Alignment(std::istream&, std::function<void (long, vg::Alignment&)> const&)
#6 Object "/vg/bin/vg", at 0x64b262, in vg::io::ProtobufIteratorvg::Alignment::fill_value()
#5 Object "/vg/bin/vg", at 0x1ea6ed8, in __cxa_throw
#4 Object "/vg/bin/vg", at 0x1ea6d76, in std::terminate()
#3 Object "/vg/bin/vg", at 0x1ea6d0b, in __cxxabiv1::__terminate(void (*)())
#2 Object "/vg/bin/vg", at 0x5c150a, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#1 Object "/vg/bin/vg", at 0x5c3ea7, in abort
#0 Object "/vg/bin/vg", at 0x14e247b, in raise
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!

What surprises me is that I have the same error for all the 6 files and rpvg worked for the first 2 but not for all the rest ones.
I checked the quality of the reads and they look good.
One thing that I just realized is that these reads were filtered for quality control and the adapters were removed from them, before doing the mapping with vg mpmap
I know that vg mpmap has a function for quality read control too. Do you think doing those extra steps before, making mapping the reads resulted in "nstance of 'std::runtime_error"?
I would appreciate your feedback.
Best

@CarlosAmadeo7
Copy link
Author

Hello rpvg team
I hope you are doing ok
Following my previous message.
I re-run again the vg mpmap with the raw counts (fast files) and I was able to map it successfully. However, When I tried to determine transcript expression using rpvg, the same results I was getting before showed up:

Running rpvg (commit: cd5160d)
Random number generator seed: 1731688828
Fragment length distribution parameters found in alignment (mean: 168.714, standard deviation: 39.9725)
Loaded graph, GBWT and r-index (8.02975 seconds, 10.2174 GB)
[E::bgzf_read_block] Failed to read BGZF block data at offset 14478733203 expected 11737 bytes; hread returned 4187
terminate called after throwing an instance of 'std::runtime_error'
what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 948877392674410 group 948870234112000
/cm/local/apps/slurm/var/spool/job17651630/slurm_script: line 26: 55680 Aborted singularity exec -B /work /work/public/singularity/rpvg_latest.sif rpvg -t 32 -g $xg_path -p $gwbt_path -f $txt_gz_path -a $mpmap_path -o rpvg --inference-model transcripts

I do not understand why I am getting this 'std::runtime_error',
Can you please give me insight into what is happening?
I would really appreciate it.
I hope for your prompt response.
Best

@jeizenga
Copy link
Collaborator

This error means that, one way or another, your file isn't in the input format that rpvg is expecting, so the file parsing is failing. However, there are lots of reasons why that could happen, so it's difficult to say definitively without probing around the pipeline a bit to check.

The vg view command you used above would have been looking for a GAM file, not a GAMP. For a GAMP, you should use the -K flag, not -a. If that runs without error, it would suggest that the error is occurring in rpvg's parsing rather than vg mpmap's writing.

Another test could be to map a small number of reads (e.g. 5000) and see if you have the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants