-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for feedback on smudgeplot optimization and interpretation (diploid? haploid?) #133
Comments
Hi, I am hurry preparing for the GenomeScope workshop tomorrow, in short.
The The Crypto data is a lot more messy, it's really hard to say if that's because there is no clear separation of peaks in the k-mer spectra. Leishmania does look quite funky. By the way, smudgeplot does look nice, and shows quite clearly that the gneomescope model is wrong, I would try to rerun it and with a coverage prior (what smudgeplot suggested) and perhaps trying both diploid and tetraploid genomes. |
WHen you run genomescope and smudgeplot they need to make sense together. The 1n coverage should be at least moreless consistent between the two. Both those software are able to take coverage priors which should allow you to make them converge somewhere close to the prior value. It's paramater So, your The same with Leisch. WHat your original smudgeplot shown was a smudge indicating the 1n coverage is ~58 (and that is a solid and very reliably looking smudge). Give that prior to genomescope, I think that would be the right model. |
Hi Kamil and folks,
Thanks for creating this clever method to examine genome structure using k-mers and releasing the new beta version for us to test out. I'm hoping to get some feedback on how to optimize these plots and interpret the results from running it on our data. I have been using jellyfish/GenomeScope2.0 and smudgeplot (the latest sploidyplot branch) using FASTK for this analysis.
We have two protist species we are working on, one called TM and the other called TC of unknown ploidy. We are trying to determine the ploidy of these two species using your software:
TM species -- Genomescope2.0 plot (p=2 yielded the lowest error):
TM species -- FASTK and smudgeplot commands
TM species -- Smudgeplot:
Smudgeplot seemed to run without errors or warnings in the case of species TM
TC species -- Genomescope2.0 plot (p=2 yielded the lowest error):
TC species -- FASTK and smudgeplot commands
TC species Smudgeplot:
Smudgeplot gave the following warning in the case of TC:
First, how do I resolve this warning/error? Reading it, the first part says to "rerun analysis with lower L" but then the second part advises to "increasing L might help remove erroneous kmers". Should I go lower or higher? :P How do I vary L in
smudgeplot plot
and how do I determine what is an appropriate L? (i.e what level would you suggest given the kmer coverage in the Genomescope plot?)Second, do you think that these data are consistent with the smudgeplot prediction that these organisms are diploid? One of the questions we also had was whether Genomescope2.0 and Smudgeplot give enough information to distinguish diploid vs. haploid organisms. In the next comment, I will also show data from trying to run Genomescope2.0 and smudgeplot on a known haploid organism for comparison.
Thank you!
The text was updated successfully, but these errors were encountered: