Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting smudgeplot output #117

Open
eweinheimer opened this issue Jun 12, 2023 · 10 comments
Open

Interpreting smudgeplot output #117

eweinheimer opened this issue Jun 12, 2023 · 10 comments
Labels
0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version 0.3.0 Oriel Upcoming version of smudgeplot genomescope included smudgeplot included if smudgeplot was posted with the quesiton / problem

Comments

@eweinheimer
Copy link

eweinheimer commented Jun 12, 2023

Hello,

Just hoping to get some insight into these smudgeplots that I have generated based on Illumina HiSeq data. These are for two closely related tree species. I've also attached the genomescope plots, for which I ran diploid and tetraploid models. Previous karyotyping studies have shown populations of these species to be diploid, triploid, tetraploid, and octaploid, though tetraploid and octaploid are the most frequently observed ploidy levels. 2C DNA content across these studies as well as our own seem to be inconsistent, but based on our flow cytometry estimates, the haploid genome size should be ~500Mbp. Haploid chromosome number is 13.

I am trying to reconcile these findings from the literature with what I'm seeing in these plots. I've fiddled around with the kmer length and that has changed the estimated heterozygosity and genome size, but ploidy is still predicted diploid. I'm having trouble being convinced of that, though, and I see from the information on this page that the model can sometimes be wrong for a variety of reasons, some of which we may be dealing with here. I'd be curious to hear your interpretation of these plots and any suggestions you may have for teasing apart this issue further. Mainly hoping to see if we can interpret anything about genome size, heterozygosity, or auto vs. allopolyploidy here.

Thank you in advance!
Species1:
tor_smudge
tor_gscope4
tor_gscope2
Species2:
rob_smudge
rob_gscope4
rob_gscope2

@KamilSJaron KamilSJaron added smudgeplot included if smudgeplot was posted with the quesiton / problem genomescope included labels Jun 12, 2023
@KamilSJaron
Copy link
Owner

Hi,

I will give you a short answer first, and I hope I will be able to get back to you in a month or so with a lot better solution.

I think you have allotetraploid. The A <-> B divergence is already very substantial, so smudgeplot has troubles picking up the AABB signal. The thing is, Gene Myers managed to find a logical mishap we left in smudgeplot, that mishap is causing that there is a greater "drop" of the higher ploidy k-mers to the lower-ploidy brackets (namely, when there are too many overlaping variants in polyploids). Which is likely the reason, why Smudgeplot is telling you diploid. If you would like to try Genes althorithm, you can try https://github.com/thegenemyers/MERQURY.FK. But we are working on merging the two tools together, so bear with me!

Sorry for not describing it here in more detail, the explanation is very nuanced, I need to write it up properly.

But in anyway, all polyploid, and applied twice for allotetraploid, will over time go from AABB or AAAB signal, towards AB signal, as the non-recombining homoelogous genomic copies diverge. Those cases are actually the border cases we should probably call degenerated tetraploids.

I think you can quite safely use the tetraploid genomescope model, it quite nicely extimates the AABB structure.

@eweinheimer
Copy link
Author

Thanks so much, this explanation was very helpful. I will definitely look into using Gene Myers' program and look forward to hearing more from you when you have the chance. In the meantime, I will proceed with my analysis assuming allotetraploidy.

@KamilSJaron
Copy link
Owner

@weinei18 we now have a beta-version working with PloidyPlot backend and Smudgeplot front-end. If you would like to give it a try, I can send you the instructions how to get started.

@KamilSJaron
Copy link
Owner

I will also close this issue for now, but do get in touch in case of anything

@eweinheimer
Copy link
Author

@KamilSJaron Yes, I would be very interested to try the beta version!

@KamilSJaron
Copy link
Owner

Excellent, presumably you already have smudgeplot, but if you don't, download the repository

git clone https://github.com/KamilSJaron/smudgeplot.git

then pull also the development branch

git pull origin sploidyplot

Now you downloaded the beta-version. There is a readme file with installation instructions and everything you need to know to run the beta-version (hopefully, it's beta after all). Let me know if it works for you.

@KamilSJaron KamilSJaron reopened this Aug 23, 2023
@KamilSJaron
Copy link
Owner

I reopened the issue and added it to project directory, so once you manage to get the new version plots, we can compare them here.

@KamilSJaron KamilSJaron added 0.3.0 Oriel Upcoming version of smudgeplot 0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version labels Aug 23, 2023
@eweinheimer
Copy link
Author

eweinheimer commented Aug 29, 2023

@KamilSJaron got it working! Didn't have any issues running it, other than those due to my own failure to install things properly. Here are the updated plots for Species 1 for a haploid coverage of 50x and the commands I ran to get there.

FastK -v -t4 -k21 -M100 -T56 JP-Vtor-GenomeSR1_R1_001_val_1.fq.gz JP-Vtor-GenomeSR1_R2_001_val_2.fq.gz -NVtor_FastK_Table
PloidyPlot -e10 -v -T56 -oVtor_kmerpairs Vtor_FastK_Table
smudgeplot/exec/smudgeplot.py plot -n 50 -t Vtor -o Vtor_smudge2.0 Vtor_kmerpairs_text.smu
Vtor_smudge_beta_log1 0 Vtor_smudge_beta1 0

@KamilSJaron
Copy link
Owner

Oh my gosh, thank you so much for this!!! They look amazing;

I recently found out the AB are possibly a tiny bit misplaced, I will be pushing more changes soon.

Biologically speaking, it is interesting you still have the AB smudge so strong, the smudgeplot clearly indicate a complicated genome structure; it has by far too many dups for a diploid, but too many "diploid" loci for a tetraploid, it could be as well a degenerated tetraploid or somerthing even more complex. Does look interesting! WHat's the species?

@eweinheimer
Copy link
Author

Certainly! I will continue to check back for the new changes. I'm not able to share the species name at this time, but I can say this... Our current theory is that the ancestor of the clade was a hybrid. Old cytogenic studies and our genomescope/smudgeplot analyses show higher ploidies and weird patterns in many species within this particular genus, at least on a continental scale, and sister clades appear to be almost exclusively diploid. Divergence time is estimated to be 20-40Mya, so the degenerated tetraploid theory could fit quite nicely. Still a lot to be teased apart because there is very little information out there about them.

Thank you for your insight, it has given us a lot to consider. Very interesting indeed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version 0.3.0 Oriel Upcoming version of smudgeplot genomescope included smudgeplot included if smudgeplot was posted with the quesiton / problem
Projects
None yet
Development

No branches or pull requests

2 participants