Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High heterogeneity in sequences identity #247

Open
lsoldini opened this issue Nov 9, 2022 · 2 comments
Open

High heterogeneity in sequences identity #247

lsoldini opened this issue Nov 9, 2022 · 2 comments

Comments

@lsoldini
Copy link

lsoldini commented Nov 9, 2022

Hello,

I am planning to do some transcriptomic analysis using vg mpmap and rpvg. To get there, I want to first build a .gfa graph with pggb.

The particularity is that I have one chromosome-level haplotype-resolved assembly as well as another same-quality assembly but of only one of the chromosome -i.e., one assembly of X chromosomes and another of only 1 chromosome (that chromosome has large inversions of several Mbs whereas all other chromosomes are very similar/identical).

How would you build a graph with such data ?

I was thinking I could complete the partial assembly with the chromosomes from the other, and then run pggb on the two 'full' assemblies, with the assumption that all identical reigons would be recognized and collapsed. Or, alternatively, should I do different runs of pggb (one with all except the divergent chromosome, and the other with only that chromosome) and later merge the .gfa ?

Edit: the name is because I am wondering whether having most regions with 100% identity and one chromosome with quite lower value would be an issue.

@subwaystation
Copy link
Member

So you have the following data sets:

  • haplotype-resolved assembly of all chromosomes (a)
  • haplotype-resolved assembly of one chromosome (b)
  • assembly of one chromosome with large inversions (c)

Are the chromosomes of (b) and (c) the same? Did you take a look at https://pggb.readthedocs.io/en/latest/rst/tutorials/divergence_estimation.html in order to measure the actual sequence divergence?

@lsoldini
Copy link
Author

lsoldini commented Jan 12, 2023

Sorry, it was not clear. There is population A and B, and there is basically no divergence between them, except for one chromosome (say population A has version AX and population B has version BX of chromosome X). I have:

  • One haplotype-resolved assembly of all -except X- chromosomes (common to both A and B)
  • One haplotype-resolved assembly of chromosome AX
  • One haplotype-resolved assembly of chromosome BX

Chromosome AX and BX have quite diverged because of loss of recombination and large inversions.

I want to build a graph over their whole genome, for further use in vg toolkit. For doing so, I'd like to use two assemblies in which all chromosomes are the same, except one. Would this be a problem for pggb ? The sequence divergence being tuned for the one chromosome that is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants