-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smudgeplot interpretation - allopolyploid or coverage too low? #164
Comments
Hi @RSchley, are you coming on Friday? I think the plot would get a lot nicer with the upcoming version? I think it boils down to - how allo is your allotetraploid. If allo are very distinct species, they will become... moreless diploids on k-mer level and I think that's what you might be seeing. That is good to know too, because then you really expect to also assemble two subgenomes and there is a change you will be struggling to map resequencing reads (I would be very strict with coverage cutoffs when calling variants). |
Hi Kamil, I was keen to register for the new smudgeplot tutorial but will unfortunately be away on Friday. Definitely will re-run everything with the new version. OK - good to know. Our prelim results suggest hybridisation is apparently fairly widespread in Inga, but polyploidy is quite restricted (from the nQuire results only ~7 out of 190 species surveyed were putative polyploids). Thanks for the advice - in other work with broader resequencing of Inga I have indeed been very strict with coverage cutoffs so that's good! Are there any parameters for FastK, GenomeScope or smudgeplot you would suggest changing? Is a kmer size of 21 ok? Is the cleaning regime ok? Should I remove duplicates (despite the fact these were PCR-free libraries)? I have had a bit of a play around with parameters before and have recovered more-or-less the same results. Finally - is the apparently incorrect number of chromosomes a signal of allopolyploidy? Thanks again |
@RSchley the new version is released! Even conda has it now. |
Hi there Kamil et al.,
First off, thanks so much for developing this amazing tool. I am having a pervasive problem with interpreting my smudgeplots - specifically, I am wondering whether they represent cases of inadequate coverage or allopolyploidy that looks like diploidy.
I am working on tropical trees with an average genome size of about 1Gbp across 13 chromosomes. There are some known tetraploids in the group already for which flow cytometry were performed (e.g. here) although most are diploid.
I ran an initial rough ploidy estimate across hundreds of species in my study genus (Inga) using nQuire (i.e. based on allelic ratios, although I know these methods can be problematic). Based on this, I did whole-genome resequencing at 40x on species inferred to be tetraploid. I then ran smudgeplot and genomescope using the suggested parameters in the OhKnow Kmer course as well as the newer BGA tutorials, with the following commands:
There were cases of true low coverage, which didnt enable distinction of the error peak from the actual peaks:
However, there were multiple cases of species with great coverage which should be tetraploid that still had low normalised minor kmer coverage. These and all others sequenced were inferred to be diploid (although many of these cases infer the wrong number of chromosomes - 16 or 15 instead of 13)
How should I understand my smudgeplot? Is it a case of divergent subgenomes in an allopolyploid or too low coverage? Or, since multiple sequencing runs were performed to get the required coverage, is it some methodological artefact from combining multiple rounds of sequencing?
Thanks so much
Rowan
The text was updated successfully, but these errors were encountered: