Differences between allele frequency table and allele frequency table around gRNA #472
Replies: 3 comments
-
Hi @DemonofLaplace, Thanks for the question and for using CRISPResso. Could you try running with the Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi @DemonofLaplace, Thanks for using CRISPResso. Yes, you are correct - the Alleles_frequency_table collapses alleles with the same sequence across the entire amplicon, while Alleles_frequency_table_around_sgRNA_XXX collapses alleles with the same sequence within a short distance around the cut site (ignoring differences that may arise away from the cut site). There are usually a small number of alleles with apparent mutations (e.g. due to sequencing errors) distal from the cut site. In the Alleles_frequency_table_around_sgRNA_XXX these are collapsed into other alleles without the apparent mutations, so the counts will be higher (because more reads are collapsed into those sequences) than the Alleles_frequency_table (where the reads with apparent mutations would appear on their own row). This also explains why increasing the plot size parameter from 20 to 30 decreases the counts - because a small number of reads will have mutations at the ends of the alleles, and they will be moved to their own row because they have a different allele sequence. Does that make sense? |
Beta Was this translation helpful? Give feedback.
-
Thanks @Colelyman @DemonofLaplace you can also check out this discussion for an example of how --expand_allele_plots_by_quantification works. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I’m encountering a discrepancy in my paired NGS data analysis with CRISPResso2 and would appreciate some insights from more experienced users.
After running Crispresso2 for my paired amplicon sequencing file, I have two output files from CRISPResso2: "Alleles_frequency_table" (to my knowledge, this provides allele frequency data for the entire amplicon) and "Alleles_frequency_table_around_sgRNA_XXXX" (to my knowledge, this provides allele frequency data for a region around the sgRNA with a plot window size of 20 bp in my setting).
Both tables list the same alleles (or variants) in the same order, but the read counts differ. While I understand that differences might arise from the window size or the scope of analysis, it seems counterintuitive that the read counts in "Alleles_frequency_table" are lower than those in "Alleles_frequency_table_around_sgRNA_XXXX." Shouldn't the read counts in the zoomed-in window around the sgRNA be lower compared to the analysis results for the entire amplicon?
Additionally, I’ve noticed that increasing the plot size parameter from 20 bp to 30 bp results in a decrease in read counts for the matched variants/alleles. The read counts in the 30 bp window are lower than those in the 20 bp window for the same variants/alleles.
I would have expected the reverse.
Does anyone have insights into why this discrepancy occurs?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions