-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interoperability with vg - error:[vg::SmallSnarlSimplifier] Invalid graph on iteration 0 #270
Comments
I think In general, we want vg to be fully compatible with PGGB. The main issue now (as you're aware, I'm sure) is that the topologies coming out of PGGB are often too complex for vg to index and/or map to effectively. Hopefully there'll be some progress on this in the coming months (it's on all our radars). Unfortunately I don't think |
In cases when the topology of a PGGB graph is too complex for vg to index or map, does it make any sense to use the consensus graphs output by |
Theoretically you could use such a consensus graph, however the paths as well as the nodes are different from the original graph. You would have to write some lines of code to translate the mapping from the node space in the consensus graph to the nodes space in the full graph. |
It looks like Specifically, the path To fix this, you could create those missing edges (maybe by editing new L lines into the GFA), or you would somehow cut the path You can check with |
Oh sorry, I'm way behind the thread here, this is known already. Still no lead on why the PGGB and/or ODGI squeeze output might be missing edges? @ekg do you try to enforce the constraint that paths must follow graph edges in those tools? |
This is definitely a bug. It should be that all edges are covered by paths
and all paths only traverse real edges.
My guess is that this is a problem with odgi squeeze. We are making some
stronger checks in smoothxg to confirm that everything is embedded.
There is also a slight possibility that it's another downstream step like
gfaffix that's introduced a problem.
…On Tue, Feb 21, 2023, 15:14 Adam Novak ***@***.***> wrote:
Oh sorry, I'm way behind the thread here, this is known already.
Still no lead on why the PGGB and/or ODGI squeeze output might be missing
edges? @ekg <https://github.com/ekg> do you try to enforce the constraint
that paths must follow graph edges in those tools?
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQENDXRHVGJO4UQXUGQLWYUV3VANCNFSM6AAAAAATOYGECM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Pinging @danydoerr. He mentioned to me that in previous gfaffix versions the graph may not have been perfectly valid. Maybe this is the case here? |
The bug is in GFAffix 1.2.x, but was fixed in March 2022. Did you run GFAffix with |
I think, like @ekg , that the bug must be in odgi squeeze. Going back to the first odgi files from PGGB, I can validate the chr4.vg successfully, but get an error for chr4 after doing odgi squeeze. PGGB -> OG per chromosome -> ODGI squeeze -> OG to GFA -> vg convert GFA to VG -> vg validate In code
|
Odgi squeeze needs to be rewritten to avoid this pattern.
And we should have validation of the graph as a standard step in the
pipeline.
…On Fri, Feb 24, 2023, 03:45 Colin Davenport ***@***.***> wrote:
I think, like @ekg <https://github.com/ekg> , that the bug must be in
odgi squeeze.
Going back to the first odgi files from PGGB, I can validate the chr4.vg
successfully, but get an error for chr4 after doing odgi squeeze.
PGGB -> OG per chromosome -> ODGI squeeze -> OG to GFA -> vg convert GFA
to VG -> vg validate
In code
# convert og to gfa
odgi view -i chr5.og -g -t 10 > chr5.gfa
odgi view -i chr4.og -g -t 10 > chr4.gfa
odgi view -i AT_pangenome_test.og -g -t 10 > AT_pangenome_test.gfa (all chr, via odgi squeeze )
# gfa to vg
vg convert -t 8 -g -v chr5.gfa > chr5.vg
vg convert -t 8 -g -v chr4.gfa > chr4.vg
vg convert -t 8 -g -v AT_pangenome_test.gfa > AT_pangenome_test.vg
# check file
vg validate chr5.vg -ok
vg validate chr4.vg -ok
vg validate AT_pangenome_test.vg -FAILS (all chr, via odgi squeeze)
errors:
vg convert -t 8 -g -v AT_pangenome_test.gfa > AT_pangenome_test.vg
vg validate AT_pangenome_test.vg
graph invalid: missing edge between 50276th step (2932976:0) and 50277th step (2934155:1) of path AT_Col-0_hq-v1.1.chr4
graph invalid: missing edge between 50277th step (2934155:0) and 50276th step (2932976:0) of path AT_Col-0_hq-v1.1.chr4
graph invalid: missing edge between 102606th step (2934276:1) and 102607th step (3646278:0) of path AT_Col-0_hq-v1.1.chr4
graph invalid: missing edge between 102607th step (3646278:1) and 102606th step (2934276:1) of path AT_Col-0_hq-v1.1.chr4
graph: invalid
# but chr4 appears fine if validated by itself without using odgi squeeze
odgi view -i chr4.og -g -t 10 > chr4.gfa
vg convert -t 8 -g -v chr4.gfa > chr4.vg
vg validate chr4.vg
graph: valid
—
Reply to this email directly, view it on GitHub
<#270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEIMQHO74QHC73WTKOTWZB7MDANCNFSM6AAAAAATOYGECM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@colindaven, which version of Moreover, do you have a 'small enough` example to deterministically reproduce the bug? |
@AndreaGuarracino it was the latest odgi release 0.8.2 I think, I repeated it last Monday and got the same problem. Please try to squeeze these five odgi files from 3 A. thaliana genomes to reproduce the error. I think you'll have to squeeze all 5 chromosomes, just a minimal version using chr3 and chr4 did not cause an error for me. https://drive.google.com/drive/folders/16zh4UrcMXqpNo6lD3CaZqmE5OCQPZwPI?usp=sharing Hope you can reproduce this. |
@colindaven thank you for the files. I was not able to reproduce the issue. I am using the odgi version
v0.8.2-92-gbfae0b3
vg version
vg version v1.40.0 "Suardi"
cat to_squeeze.txt
chr1.og
chr2.og
chr3.og
chr4.og
chr5.og
odgi squeeze -f to_squeeze.txt -o all.og -t 48 -P
odgi validate -i all.og -t 48 -P # no problems here
odgi view -i all.og -g > all.gfa
vg convert -t 8 -g -v all.gfa > all.vg
vg validate all.vg # no problems here
graph: valid |
@AndreaGuarracino Thanks for this. Yes, you're correct, the odgi I originally used was from a pipeline with a container written before you fixed the bug in 2022. I can confirm the current bioconda release of odgi squeeze now runs without problems on this data. Its good that odgi validate will pick up any incorrect graphs without needing a gfa-> VG conversion. I've upgraded a mapping pipeline to check any problems with inputted odgi files right at the start to avoid these errors in future. Thanks again. |
Hi,
thanks for your efforts with PGGB and odgi first up.
I've generated quite a few graphs in PGGB and am trying to work with them in vg and with graphaligner. In general, I can map short and long reads and sometimes call SNPs, yet problems do occur.
I have read multiple issues from the vg team saying they test on minigraph-cactus graphs rather than PGGB graphs. Do you see any reason why the following error might occur, and how I might solve it ? Could vg simplify be too old, or the two ecosystems actually be somewhat incompatible ?
Workflow
PGGB -> GFA per chromosome -> ODGI squeeze -> OG to GFA -> vg convert GFA to VG -> vg simplify (fails) -> vg to GFA
My goal is to simplify the graph prior to alignment to increase mapping speed and decrease downstream problems with SNP calling and SV genotyping.
Thanks
The text was updated successfully, but these errors were encountered: