Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editing quantification with multiple sgRNAs #499

Open
jsromanowski opened this issue Oct 25, 2024 · 1 comment
Open

Editing quantification with multiple sgRNAs #499

jsromanowski opened this issue Oct 25, 2024 · 1 comment

Comments

@jsromanowski
Copy link

jsromanowski commented Oct 25, 2024

Hello CRISPResso Team -

I am currently using the CRISPRessoWGS feature to quantify editing of CRISPR treated samples with 2 sgRNAs. These cut sites are close in proximity - 4 bases apart to be precise. I noticed when I input both sgRNAs (sgRNA1 and sgRNA2), CRISPRessoWGS is quantifying editing at both sgRNA sites for an individual run where I'd only like editing at one sgRNA's site quantified (i.e. 'CRISPResso_on_sgRNA1' analysis includes alleles that have deletions from sgRNA2's cut site, as shown on allele output table). My guess is because those insertions/deletions from sgRNA2's editing extends into the quantification window of sgRNA1 and is causing CRISPRessoWGS to count them, but I don't think I can shrink this window since the default window of analysis is 1 base on either side of the sgRNA cut site (correct me if I'm wrong). This becomes an issue since the insertions and deletions from sgRNA2's edits do not overlap the sgRNA1 cut site, so even when sgRNA1's cut site is specified, sgRNA2's cut site is also analyzed and therefore miscounts sgRNA1's modified reads by over-quantifying editing events.

This brings me to my question - is there a way to ensure only one sgRNA cut site's editing is quantified at a time, instead of both sgRNA's cut sites? Or better yet, is there a way CRISPRessoWGS can quantify editing at both sgRNA1 and sgRNA2's cut sites without double-counting editing events? (i.e. sgRNA1's edited alleles are not counted again in sgRNA2's editing analysis). Perhaps extending the quantification window to encompass both cut sites might work? I hope this makes sense.

Great work on this package, by the way! Any help would be appreciated.

Best,

Joe

@kclem
Copy link
Member

kclem commented Oct 25, 2024

Hi @jsromanowski,

Thanks for using CRISPResso, and I hope I can clear up some confusion here.

  1. How CRISPResso Works (or at least the parts that are relevant to your question).

The quantification window (the bases that where if edits are present the read will be considered 'modified') are set early in the pipeline based on sgRNA positions and the user-specific parameters for quantification window size and offset.

After the quantification window is set, reads are aligned to the reference amplicon, and reads with edits in any of the quantification window bases are set to be 'modified' - that is, the edits in a read aren't assigned to a specific guide, instead, the presence of an edit is noted in the quantification window. The benefit of this is that there is no double-counting of reads by whether they were edited at 1 or 2 sgRNA target sites. Instead, the program reports the number of reads edited at any base in the quantification window.

A the end of the analysis, plots and allele tables are produced, some for the whole amplicon and others that are zoomed in on each cut site. However, all reads aligned to the amplicon contribute to each plot, so the reads (and corresponding mutations) that appear in sgRNA1 plots will also appear in the sgRNA2 plot if they are in the same window. There may be slight variation in the sgRNA1 vs gRNA2 plot because the plots show different bases in their plotting window, and alleles with the same sequence within the plotting window are collapsed. Note that the sgRNA1 plot doesn't only contain reads that were edited by sgRNA1 and the sgRNA2 plot doesn't contain only reads edited by sgRNA2.

  1. The Difficulties Associated With Doing What You Want To Do (at least as I understand it).
    Especially if your sgRNAs are close together, it's hard to tell which modifications arise from each guide - particularly long deletions. For a single sgRNA, we have seen that long deletions are not really 'centered' at the sgRNA cut site, meaning that they could extend left or right from the predicted cut site in a pretty unpredictable way. Because of this, if we see a deletion that spans the cut site of sgRNA1 and sgRNA2 it is impossible to tell which sgRNA to 'assign' it to. Does that make sense? If you have some better way to assign mutations to specific guides, I'm happy to talk about it.

  2. Some workarounds

  • You may consider looking at the CRISPResso output Modification_count_vectors.txt' file at your cut sites. This will show you the number of insertions, deletions, and substitutions that overlap with that site. Note that this may include some deletions that could have arisen from editing by the neighboring guide, so be careful of double counting.

  • You could also consider trying to filter out reads with any modification (insertion or deletion) at the neighboring guide, and then count the remaining reads with indels at the primary guide.

  • I wrote this handy dandy script https://github.com/pinellolab/CRISPResso2/blob/master/scripts/count_sgRNA_specific_edits.py to try to tease out this sort of information. You can run it on your CRISPResso output folder and it will tell you how many reads were edited at one or both of your guides (based on the quantification window size for each guide).

Do any of those seem like they would help you with this problem? If not, feel free to reach out to me at [email protected] if you'd like to talk about this more or come up with better methods to assign edits to single guides.

Good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants