Assertion `hasNodeId(node_id)' failed #43

wtulle · 2023-02-24T14:07:23Z

Hi, I need help with an error running rpvg command.
I'm getting "paths_index.cpp:73: uint32_t PathsIndex::nodeLength(uint32_t) const: Assertion `hasNodeId(node_id)' failed" error, but I have no idea what that means.
The full command is:
rpvg -t 2 -g ref.xg -p pantranscriptome.gbwt -a mpmap.gamp -o rpvg --inference-model haplotypes-transcripts

Any idea of what the problem could come from, the .xg, .gbwt or .gamp parameter?
Thanks!

jonassibbesen · 2023-02-27T09:32:26Z

Hi, from the error it looks like the pantranscriptome does not match the graph that was given as input (ref.xg). Could you share the command lines you used when running vg rna and vg mpmap? Thanks!

wtulle · 2023-02-27T12:38:57Z

Hi, here are the full commands I used:

bgzip -c contigs.vcf > contigs.vcf.gz && tabix contigs.vcf.gz
vg construct -r ref.fa -v contigs.vcf.gz -a > ref.vg
vg index -x ref.xg -T ref.vg
vg autoindex --workflow mpmap -t 2 --prefix vg_rna --ref-fasta ref.fa --vcf contigs.vcf.gz --tx-gff kortha.gtf

vg mpmap -t 2 -x vg_rna.spliced.xg -g vg_rna.spliced.gcsa -d vg_rna.spliced.dist -f r1.fq.gz -f r2.fq.gz > mpmap.gamp
vg rna -n kortha.gtf ref.vg -i pantranscriptome.txt -b pantranscriptome.gbwt > pantranscriptome.txt.gz
rpvg -t 2 -g ref.xg -p pantranscriptome.gbwt -f pantranscriptome.txt -a mpmap.gamp -o rpvg --inference-model haplotype-transcripts

Thanks!

jeizenga · 2023-02-27T22:05:48Z

To co-generate the pantranscriptome and the spliced graph/indexes, you should use vg autoindex --workflow mpmap --workflow rpvg. That should guarantee that all of the indexes are consistent with each other. In general, I would recommend against mixing indexes from the automatic vg autoindex pipeline and manual pipelines, which could use different construction parameters.

wtulle · 2023-03-01T12:38:46Z

Sorry to bother you again, I'm trying your answer but now I'm getting another error I can't resolve:
vg autoindex --workflow mpmap --workflow rpvg --prefix vg_rna --ref-fasta ref.fa --vcf contigs.vcf.gz --tx-gff kortha.gff

The output:
[IndexRegistry]: Checking for phasing in VCF(s).
error:[vg autoindex] Input is not sufficient to create indexes
Inputs
GTF/GFF
Reference FASTA
VCF
are insufficient to create target index Haplotype-Transcript GBWT

I've tried adding the parameter --gfa ref.gfa with similar result:

[IndexRegistry]: Checking for phasing in VCF(s).
[IndexRegistry]: Provided: VCF
[IndexRegistry]: Checking for haplotype lines in GFA.
[IndexRegistry]: Provided: Reference GFA
error:[vg autoindex] Input is not sufficient to create indexes

jeizenga · 2023-03-01T17:14:58Z

The issue is that vg autoindex isn't finding phased variants in the VCF, which are required to form the haplotype-specific transcripts in the pantranscriptome. Is your VCF un-phased?

ld9866 · 2023-03-08T00:14:06Z

The issue is that vg autoindex isn't finding phased variants in the VCF, which are required to form the haplotype-specific transcripts in the pantranscriptome. Is your VCF un-phased?
Hello!
Have you solved the problem?
I had the same problem

jeizenga · 2023-03-08T00:41:48Z

Have you already checked for phasing in your VCF?

ld9866 · 2023-03-08T00:43:52Z

Hello!
I would like to ask how to "phasing in your VCF", do you mean that this vcf needs to use bcftools for quality control?

jeizenga · 2023-03-08T00:54:26Z

VCFs can express either phased or unphased genotypes. Phased genotypes link together the alleles at multiple loci as occurring on the same haplotype. Unphased genotypes simply assert the alleles at each locus without specifying what combination of alleles co-occur on each haplotype. The pantranscriptome is built from haplotype-specific transcripts, so you need phased genotypes in order to specify the haplotype sequences. There's more detail about phasing in the VCF format in section 1.4.2 of the file specification.

ld9866 · 2023-03-08T01:23:29Z

example.vcf.zip
I'm sorry to bother you again!
I have read the recommended document according to what you said, but I am not very clear about how to carry out the follow-up operation of my vcf. Could you please help me ?
Best wishes

jeizenga · 2023-03-08T01:56:40Z

The most common way you'd see it expressed is if the genotypes were separated with a bar (e.g. 0|1) rather than a slash (e.g. 0/1), but it looks like your genotypes all have ploidy of 1. Does that make sense for your organism? It could be that this is an unhandled edge case in vg autoindex.

ld9866 · 2023-03-08T02:05:03Z

Thank for your reply!
First, l used the minigraph-cactus to establish a pan-genome which result in the primates-pg.vcf.gz primates-pg.gfa.gz.
Secondly, l used the code "vg autoindex --workflow mpmap -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" which works ok and vg mpmap works well. However, l found that l can not have the pantranscriptome.gbwt and pantranscriptome.txt.gz in your example file.
l found that you said the "vg autoindex --workflow mpmap --workflow rpvg" and l print the code "vg autoindex --workflow mpmap --workflow rpvg -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" but it showed that

[IndexRegistry]: Checking for haplotype lines in GFA.
error:[vg autoindex] Input is not sufficient to create indexes
Inputs
GTF/GFF
Reference FASTA
Reference GFA w/ Haplotypes
are insufficient to create target index Haplotype-Transcript GBWT

This is exactly the crux of my problem, and I find it very difficult to understand, again to bother you.
Best yours,

jeizenga · 2023-03-08T02:24:36Z

I think the issue is further upstream in this pipeline: you're not getting a phased VCF from your Minigraph-Cactus workflow. I would think that Minigraph-Cactus could (in theory) produce phased VCFs, so I'm not sure why you're not getting one. I would recommend making a support request/issue to the Minigraph-Cactus developers to see if it's possible to get a phased VCF as output. It's also a bit puzzling to me that you're getting haploid genotypes on chromosome 1 in the example you gave me, since that's a diploid chromosome in primates.

ld9866 · 2023-03-08T02:28:21Z

Ok, thank you for your reply. Wish you all the best!
We will ask the developer about this, thanks.

wtulle · 2023-03-08T12:03:59Z

Hello, sorry for the delay. I phased my vcf using vcf_phase.py (https://ppp.readthedocs.io/en/latest/PPP_pages/Functions/vcf_phase.html). With this phased vcf the autoindex command is working now.

wtulle · 2023-03-08T13:26:45Z

Now, I'm getting an error running rpvg:
rpvg -t 2 -g ref.xg -p pantranscriptome.gbwt -f pantranscriptome.txt -a mpmap.gamp -o rpvg --inference-model haplotype-transcripts

Error:
Running rpvg (commit: d0478d0)
Random number generator seed: 1678281590
Fragment length distribution parameters found in alignment (mean: 774.566, standard deviation: 19.276)
Loaded graph and GBWT (9.37961 seconds, 6.22832 GB)
[E::bgzf_read_block] Failed to read BGZF block data at offset 110622063 expected 6159 bytes; hread returned 2687
terminate called after throwing an instance of 'std::runtime_error'
what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 7248584834528 group 7242883472583

jonassibbesen · 2023-03-08T13:38:46Z

Now, I'm getting an error running rpvg: rpvg -t 2 -g ref.xg -p pantranscriptome.gbwt -f pantranscriptome.txt -a mpmap.gamp -o rpvg --inference-model haplotype-transcripts

Error: Running rpvg (commit: d0478d0) Random number generator seed: 1678281590 Fragment length distribution parameters found in alignment (mean: 774.566, standard deviation: 19.276) Loaded graph and GBWT (9.37961 seconds, 6.22832 GB) [E::bgzf_read_block] Failed to read BGZF block data at offset 110622063 expected 6159 bytes; hread returned 2687 terminate called after throwing an instance of 'std::runtime_error' what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 7248584834528 group 7242883472583

Hi, I am not sure why this is happening. Would you be able to share the data? You can send it to [email protected] Thanks!

jonassibbesen · 2023-03-08T13:40:25Z

Thank for your reply! First, l used the minigraph-cactus to establish a pan-genome which result in the primates-pg.vcf.gz primates-pg.gfa.gz. Secondly, l used the code "vg autoindex --workflow mpmap -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" which works ok and vg mpmap works well. However, l found that l can not have the pantranscriptome.gbwt and pantranscriptome.txt.gz in your example file. l found that you said the "vg autoindex --workflow mpmap --workflow rpvg" and l print the code "vg autoindex --workflow mpmap --workflow rpvg -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" but it showed that

[IndexRegistry]: Checking for haplotype lines in GFA. error:[vg autoindex] Input is not sufficient to create indexes Inputs GTF/GFF Reference FASTA Reference GFA w/ Haplotypes are insufficient to create target index Haplotype-Transcript GBWT

This is exactly the crux of my problem, and I find it very difficult to understand, again to bother you. Best yours,

Hi, see related issue: #46 (comment)

ld9866 · 2023-03-08T13:42:51Z

Hello, sorry for the delay. I phased my vcf using vcf_phase.py (https://ppp.readthedocs.io/en/latest/PPP_pages/Functions/vcf_phase.html). With this phased vcf the autoindex command is working now.

Thank you for your help!

ld9866 · 2023-03-08T13:44:14Z

Thank for your reply! First, l used the minigraph-cactus to establish a pan-genome which result in the primates-pg.vcf.gz primates-pg.gfa.gz. Secondly, l used the code "vg autoindex --workflow mpmap -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" which works ok and vg mpmap works well. However, l found that l can not have the pantranscriptome.gbwt and pantranscriptome.txt.gz in your example file. l found that you said the "vg autoindex --workflow mpmap --workflow rpvg" and l print the code "vg autoindex --workflow mpmap --workflow rpvg -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" but it showed that
[IndexRegistry]: Checking for haplotype lines in GFA. error:[vg autoindex] Input is not sufficient to create indexes Inputs GTF/GFF Reference FASTA Reference GFA w/ Haplotypes are insufficient to create target index Haplotype-Transcript GBWT
This is exactly the crux of my problem, and I find it very difficult to understand, again to bother you. Best yours,

Hi, see answer here in a related issue: #46 (comment)

Thank you!
l am trying it for your suggestion.

abcyulongwang mentioned this issue Mar 8, 2023

How to genotype the result file primates-pg.vcf.gz of minigraph_cactus？ ComparativeGenomicsToolkit/cactus#954

Closed

jeizenga mentioned this issue Mar 16, 2023

Treat haploid VCFs as phased in vg autoindex vgteam/vg#3889

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion `hasNodeId(node_id)' failed #43

Assertion `hasNodeId(node_id)' failed #43

wtulle commented Feb 24, 2023

jonassibbesen commented Feb 27, 2023

wtulle commented Feb 27, 2023

jeizenga commented Feb 27, 2023

wtulle commented Mar 1, 2023

jeizenga commented Mar 1, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

wtulle commented Mar 8, 2023 •

edited

Loading

wtulle commented Mar 8, 2023

jonassibbesen commented Mar 8, 2023

jonassibbesen commented Mar 8, 2023 •

edited

Loading

ld9866 commented Mar 8, 2023

ld9866 commented Mar 8, 2023

Assertion `hasNodeId(node_id)' failed #43

Assertion `hasNodeId(node_id)' failed #43

Comments

wtulle commented Feb 24, 2023

jonassibbesen commented Feb 27, 2023

wtulle commented Feb 27, 2023

jeizenga commented Feb 27, 2023

wtulle commented Mar 1, 2023

jeizenga commented Mar 1, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

jeizenga commented Mar 8, 2023

ld9866 commented Mar 8, 2023

wtulle commented Mar 8, 2023 • edited Loading

wtulle commented Mar 8, 2023

jonassibbesen commented Mar 8, 2023

jonassibbesen commented Mar 8, 2023 • edited Loading

ld9866 commented Mar 8, 2023

ld9866 commented Mar 8, 2023

wtulle commented Mar 8, 2023 •

edited

Loading

jonassibbesen commented Mar 8, 2023 •

edited

Loading