Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smudgeplot predicts polyploidy #182

Open
leio0326 opened this issue Dec 8, 2024 · 8 comments
Open

Smudgeplot predicts polyploidy #182

leio0326 opened this issue Dec 8, 2024 · 8 comments

Comments

@leio0326
Copy link

leio0326 commented Dec 8, 2024

About your genome
Hello @KamilSJaron
We have a genome of approximately 15G, but the karyotype composition is unclear. When we mounted the HIC data, we found that it is not a diploid species. We used smudgeplot for prediction. The data of the Illumina is about 800Gb, and the results are as follows:
微信图片_20241208165229

The prediction may be a hexaploid, but from the results in the graph, why is there a type of 5A3B, and most of the kmer pairs (B/(A+B)) are 1/2? Can you help me figure out what kind of ploidy it is? Thank you very much!

@KamilSJaron
Copy link
Owner

Hi, what is the species? Can you post here also the GenomeScope plot? I think your 1n coverage was inferred wrong, that should be clear from the other plot.

@leio0326
Copy link
Author

leio0326 commented Dec 9, 2024

Hi, what is the species? Can you post here also the GenomeScope plot? I think your 1n coverage was inferred wrong, that should be clear from the other plot.

@KamilSJaron
It is a species of insect class, and arthropods do not exhibit polyploidy.This is the result of genomescope (using kmc to calculate kmer)
linear_plot

But the genome size we assembled through HIFI reads (individual) ranged from 14-15Gb, and busco reached 98.5%

@KamilSJaron
Copy link
Owner

I got to say @leio0326, asking for interpretational help that do not share even the species name if a bit of a bummer for me... What is in there for me?

Notice the GenomeScope predicts a very different 1n coverage, and I think in this case it is right. I recommend to rerunning the plotting part of smudgeplot with that coverage as prior, you might get a more meaningful plot.

@leio0326
Copy link
Author

leio0326 commented Dec 9, 2024

I got to say @leio0326, asking for interpretational help that do not share even the species name if a bit of a bummer for me... What is in there for me?

Notice the GenomeScope predicts a very different 1n coverage, and I think in this case it is right. I recommend to rerunning the plotting part of smudgeplot with that coverage as prior, you might get a more meaningful plot.

Actually, we don't know the name of this species either. It's a new species, only that it belongs to Archaeongatha,we don't have enough taxonomic experience. Thank you very much for answering my confusion.

@leio0326 leio0326 closed this as completed Dec 9, 2024
@KamilSJaron
Copy link
Owner

Oh wow, that's a super cool clade; thanks for sharing (it really makes a huge difference to me - especially because I keep track of how these plots look in various clades...).

One thing I noticed just now is that genomescope predicts ~5Gbp, which is too little given what you think about the genome, therefore I imagine you are assembling separate haplotypes then and I imagine those are fairly divergent too (given the first peak is so massive compared to the rest of the plot). The only question is how many haplotypes you have. Given your peaks are not spaces by thirds (triploids are very conspicuous), I would imagine you have a tetraploid (getting the smudgeplot right might help you clarify that).

@leio0326 leio0326 reopened this Dec 10, 2024
@leio0326
Copy link
Author

leio0326 commented Dec 10, 2024

Oh wow, that's a super cool clade; thanks for sharing (it really makes a huge difference to me - especially because I keep track of how these plots look in various clades...).

One thing I noticed just now is that genomescope predicts ~5Gbp, which is too little given what you think about the genome, therefore I imagine you are assembling separate haplotypes then and I imagine those are fairly divergent too (given the first peak is so massive compared to the rest of the plot). The only question is how many haplotypes you have. Given your peaks are not spaces by thirds (triploids are very conspicuous), I would imagine you have a tetraploid (getting the smudgeplot right might help you clarify that).

Thank you for your answer. When I was assembling with hifiasm(-p), two haplotypes appeared, one is ~10G and the other is ~6G. I am currently trying to merge them together to mount chromosomes. And I tried to use kmc.dump to draw the smudgeplot(like smudgeplot.py hetkmers -o shibing_1_1000 < kmcdb_L1_U1000.dump), but it may have been killed by the administrator because the data was too large and occupied too much memory.

Oh, I have another question, can the results of fastk(.ktab and .hist) be used to run genomescope? I see they are all binary files, do I need to convert the format?

@KamilSJaron
Copy link
Owner

fastK can be used to create a k-mer histogram for genomeScope! This tutorial: https://github.com/KamilSJaron/smudgeplot/wiki/BGA24 shows how to do that.

and all Smudgeplot version >= v0.3 work only with FastK, it is incompatible with the KMC dumps (the tradeoff is ~40x speedup and proper multithreading, the new version should work better for you).

@leio0326
Copy link
Author

fastK can be used to create a k-mer histogram for genomeScope! This tutorial: https://github.com/KamilSJaron/smudgeplot/wiki/BGA24 shows how to do that.

and all Smudgeplot version >= v0.3 work only with FastK, it is incompatible with the KMC dumps (the tradeoff is ~40x speedup and proper multithreading, the new version should work better for you).

I ran Genomescope again based on the results from fastk and modified the value of 1n to redraw the smudgeplot (- n 43.5)

linear_plot
image

The predicted result this time is a diploid, but the difference between the actual assembly result and the predicted result is too large. Is it possible that a whole genome duplication has occurred?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants