-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seqwish output with simulated data #69
Comments
Hi Eugene,
I would try to run smoothxg on the output. The edyeet alignments do not
have affine gap penalties. This makes their representation of indels
imprecise. But even better alignments (such as those made by kssw/minimap2)
are not mutually normalized, and will result in complex looping motifs in
e.g. low complexity sequence like microsatellites. By realigning the graph
locally with POA (in smoothxg) the alignments are normalized relative to
each other. The graph tends to be smaller than the one made directly by
seqwish.
The pangenome graph builder (pggb) makes some attempts to link all these
steps together, if you want to get a sense of a typical approach that we
are using.
Also, increasing the segment length can reduce collapse which might appear
to introduce more variation.
The tags are probably coming out of odgi view. They are designed to trick
Bandage into displaying coverage for the nodes. I think RC is the number of
path steps on the node. DP is a metric that is scaled by the length to meet
Bandage's expectations.
…On Fri, Dec 4, 2020, 23:21 Eugene Goltsman ***@***.***> wrote:
Hi Erik,
I am doing some precision/recall analysis on a simulated set of 13 samples
where each "sample" is a random mutant of a real-life plant chromosome. I
introduced exactly 200 SVs per sample and the types range between deletion,
inversion, tandem-duplication, and translocations. The variant sizes are
fixed at 500bp and 10kb. After using edyeet+seqwish to construct the graphs
with these sequences, plus the original reference, I now have 14 graph of
increasing complexity and would like to see how well the variants can be
"deconstructed" from them. So I took the GFA->vg route for each graph and
used 'vg snarls' to get the bubbles out. It reports a lot more variants
than what I had introduced, even in a 2-sample graph. My suspicion is that
edyeet misaligned some of the regions, and I want to try it again with more
stringent parameters. Do you think this is something worth pursuing, or is
edyeet not designed to handle this scenario?
Another question is about the GFA tags that seqwish puts it. Sorry if this
is described in some obvious place, but what are the DP: RC: tags for?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#69>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEOECQM6J5G6T6YSJBTSTFOHPANCNFSM4UOA327Q>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Erik,
I am doing some precision/recall analysis on a simulated set of 13 samples where each "sample" is a random mutant of a real-life plant chromosome. I introduced exactly 200 SVs per sample and the types range between deletion, inversion, tandem-duplication, and translocations. The variant sizes are fixed at 500bp and 10kb. After using edyeet+seqwish to construct the graphs with these sequences, plus the original reference, I now have 14 graph of increasing complexity and would like to see how well the variants can be "deconstructed" from them. So I took the GFA->vg route for each graph and used 'vg snarls' to get the bubbles out. It reports a lot more variants than what I had introduced, even in a 2-sample graph. My suspicion is that edyeet misaligned some of the regions, and I want to try it again with more stringent parameters. Do you think this is something worth pursuing, or is edyeet not designed to handle this scenario?
Another question is about the GFA tags that seqwish puts it. Sorry if this is described in some obvious place, but what are the DP: RC: tags for?
The text was updated successfully, but these errors were encountered: