Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smudgeplot interpretation - allopolyploid or coverage too low? #164

Open
RSchley opened this issue Oct 16, 2024 · 3 comments
Open

Smudgeplot interpretation - allopolyploid or coverage too low? #164

RSchley opened this issue Oct 16, 2024 · 3 comments

Comments

@RSchley
Copy link

RSchley commented Oct 16, 2024

Hi there Kamil et al.,

First off, thanks so much for developing this amazing tool. I am having a pervasive problem with interpreting my smudgeplots - specifically, I am wondering whether they represent cases of inadequate coverage or allopolyploidy that looks like diploidy.

I am working on tropical trees with an average genome size of about 1Gbp across 13 chromosomes. There are some known tetraploids in the group already for which flow cytometry were performed (e.g. here) although most are diploid.

I ran an initial rough ploidy estimate across hundreds of species in my study genus (Inga) using nQuire (i.e. based on allelic ratios, although I know these methods can be problematic). Based on this, I did whole-genome resequencing at 40x on species inferred to be tetraploid. I then ran smudgeplot and genomescope using the suggested parameters in the OhKnow Kmer course as well as the newer BGA tutorials, with the following commands:


######################################################## Trim with fastp ########################################################
## remember, it's best to trim reads for smudgeplot (see https://github.com/KamilSJaron/smudgeplot?tab=readme-ov-file#smudgeplot)

## Run fastp v0.23.4, with:
#  Quality =<22 removed (-q 22)
#  Length =<30 removed (-l 30)
#  Adapters detected and removed --detect_adapter_for_pe 
#  Base correction in overlapping reads (-c) 
#  Poly g tails trimmed (-g) 

fastp -q 22 -l 30 --detect_adapter_for_pe -c -g -i "$DATADIR"/V*/01_raw_reads/"$SEQNAME"_*_R1_*.fastq.gz -I "$DATADIR"/V*/01_raw_reads/"$SEQNAME"_*_R2_*.fastq.gz -o "$DATADIR"/All_fastp_trimmed/"$SEQNAME"_R1_trimmed.fastq.gz -O "$DATADIR"/All_fastp_trimmed/"$SEQNAME"_R2_trimmed.fastq.gz  

########################################################## Run genomescope ##########################################################


#First, you need to generate a kmer database with FastK 
## Make kmer database by running FastK with kmer size of 31, 4 threads
echo 'Making Kmer database'
FastK -v -t4 -k21" -M16 -T4 "$DATADIR"/All_fastp_trimmed/"$SEQNAME"_R*_trimmed.fastq.gz -N"$SEQNAME"


#Make histogram with Histex
Histex -G "$SEQNAME" > "$SEQNAME"_k"$KMER"_GS.hist

# Run GenomeScope
genomescope2 -p 2 -i ../"$SEQNAME"_k"$KMER"_GS.hist -o ./"$SEQNAME"_dip -n "$SEQNAME"_dip -k 21 --verbose
genomescope2 -p 3 -i ../"$SEQNAME"_k"$KMER"_GS.hist -o ./"$SEQNAME"_tri -n "$SEQNAME"_tri -k 21 --verbose
genomescope2 -p 4 -i ../"$SEQNAME"_k"$KMER"_GS.hist -o ./"$SEQNAME"_tet -n "$SEQNAME"_tet -k 21 --verbose


########################################################## Running smudgeplots ##########################################################
# from BGA workshop version https://github.com/BGAcademy23/smudgeplot

#Run smudgeplot
/home/rschley/scratch/apps/smudgeplot/exec/smudgeplot.py hetmers -L 10 -t 4 --verbose -o "$SEQNAME"_k21_pairs ../"$SEQNAME".ktab

#Plot smudgeplot
/home/rschley/scratch/apps/smudgeplot/exec/smudgeplot.py plot -t "$SEQNAME" -o "$SEQNAME"_k21_smudgeplot "$SEQNAME"_k21_pairs_text.smu

There were cases of true low coverage, which didnt enable distinction of the error peak from the actual peaks:

image
image

However, there were multiple cases of species with great coverage which should be tetraploid that still had low normalised minor kmer coverage. These and all others sequenced were inferred to be diploid (although many of these cases infer the wrong number of chromosomes - 16 or 15 instead of 13)

image
image

How should I understand my smudgeplot? Is it a case of divergent subgenomes in an allopolyploid or too low coverage? Or, since multiple sequencing runs were performed to get the required coverage, is it some methodological artefact from combining multiple rounds of sequencing?

Thanks so much
Rowan

@KamilSJaron
Copy link
Owner

Hi @RSchley, are you coming on Friday? I think the plot would get a lot nicer with the upcoming version?

I think it boils down to - how allo is your allotetraploid. If allo are very distinct species, they will become... moreless diploids on k-mer level and I think that's what you might be seeing. That is good to know too, because then you really expect to also assemble two subgenomes and there is a change you will be struggling to map resequencing reads (I would be very strict with coverage cutoffs when calling variants).

@RSchley
Copy link
Author

RSchley commented Oct 16, 2024

Hi Kamil, I was keen to register for the new smudgeplot tutorial but will unfortunately be away on Friday. Definitely will re-run everything with the new version.

OK - good to know. Our prelim results suggest hybridisation is apparently fairly widespread in Inga, but polyploidy is quite restricted (from the nQuire results only ~7 out of 190 species surveyed were putative polyploids). Thanks for the advice - in other work with broader resequencing of Inga I have indeed been very strict with coverage cutoffs so that's good!

Are there any parameters for FastK, GenomeScope or smudgeplot you would suggest changing? Is a kmer size of 21 ok? Is the cleaning regime ok? Should I remove duplicates (despite the fact these were PCR-free libraries)? I have had a bit of a play around with parameters before and have recovered more-or-less the same results.

Finally - is the apparently incorrect number of chromosomes a signal of allopolyploidy?

Thanks again
Rowan

@KamilSJaron
Copy link
Owner

@RSchley the new version is released! Even conda has it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants