From 123ffe093181e514894d74c4e9f0a5f72125157e Mon Sep 17 00:00:00 2001
From: Cole Lyman <Cole@colelyman.com>
Date: Thu, 4 Apr 2024 13:15:41 -0600
Subject: [PATCH] Fix case sensitivity in Prime Editing mode (#54)

* Move read filtering to after merging in CRISPResso (#39)

* Move read filtering to after merging

This is in an effort to be consistent with the behavior and results of
CRISPRessoPooled.

* Properly assign the correct file names for read filtering

* Add space around operators

* GitHub actions on pr (#51)

* Run integration tests on pull_request

* Run pytest on pull_request

* Run pylint on pull_request

* Run tests on PR only when opening PR (#53)

* Update reports (#52)

* Update report changes

* Switch branch of integration test repo

* Remove extraneous `crispresso_data_path`

* Point integration tests back to master

* Make all amplicons in amplicon_seq_arr uppercase

This fixes https://github.com/pinellolab/CRISPResso2/issues/396

* Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>

* Guardrails clean history (#34)

* Include guardrail functions

* Add CRISPRessoReports subtree

* Refactor to use CRISPRessoReports module

* Include guardrail functions

* Functional guardrails, needs reports update

* Add guardrail partial

* fix guardrials partial

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: Cole Lyman <cole@colelyman.com>

* Update C cythonized files

* Add exact numbers to guardrails printouts

* Remove extraneous whitespace from CRISPRessoCOREResources.pyx

* Fix calculation of `total_mods` from being negative

The issue was that `all_deletion_coordinates` just tells you how many deletions
were present, but not how long the deletion is.

* Changes to message

* Remove old tag

* Point tests at guardrails

* Restore C2 pro check

* Save message with guardrail name

* Point tests repo at master

---------

Co-authored-by: Cole Lyman <cole@colelyman.com>
Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>

---------

Co-authored-by: Samuel Nichols <Snic9004@gmail.com>
Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com>
Co-authored-by: trevormartinj7 <trevormartinj7@gmail.com>
---
 CRISPResso2/CRISPRessoCORE.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/CRISPResso2/CRISPRessoCORE.py b/CRISPResso2/CRISPRessoCORE.py
index fc3e1ca3..a88b63a3 100644
--- a/CRISPResso2/CRISPRessoCORE.py
+++ b/CRISPResso2/CRISPRessoCORE.py
@@ -261,7 +261,7 @@ def get_pe_scaffold_search(prime_edited_ref_sequence, prime_editing_pegRNA_exten
     """
     info('Processing pegRNA scaffold sequence...')
     #first, define the sequence we are looking for (extension plus the first base(s) of the scaffold)
-    scaffold_dna = CRISPRessoShared.reverse_complement(prime_editing_pegRNA_scaffold_seq)
+    scaffold_dna = CRISPRessoShared.reverse_complement(prime_editing_pegRNA_scaffold_seq.upper().replace('U','T'))
 
     extension_seq_dna_top_strand = prime_editing_pegRNA_extension_seq.upper().replace('U', 'T')
     prime_editing_extension_seq_dna = CRISPRessoShared.reverse_complement(extension_seq_dna_top_strand)
@@ -1560,7 +1560,7 @@ def rreplace(s, old, new):
                 )
 
         else: #not auto
-            amplicon_seq_arr = args.amplicon_seq.split(",")
+            amplicon_seq_arr = list(map(lambda x: x.upper(), args.amplicon_seq.split(",")))
             amplicon_name_arr = args.amplicon_name.split(",")
             #split on commas, only accept empty values
             amplicon_min_alignment_score_arr = [float(x) for x in args.amplicon_min_alignment_score.split(",") if x]
@@ -4711,7 +4711,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_
             scaffold_insertion_sizes_filename = ""
             if args.prime_editing_pegRNA_scaffold_seq != "":
                 #first, define the sequence we are looking for (extension plus the first base(s) of the scaffold)
-                scaffold_dna_seq = CRISPRessoShared.reverse_complement(args.prime_editing_pegRNA_scaffold_seq)
+                scaffold_dna_seq = CRISPRessoShared.reverse_complement(args.prime_editing_pegRNA_scaffold_seq.upper().replace('U','T'))
                 pe_seq = refs['Prime-edited']['sequence']
                 pe_scaffold_dna_info = get_pe_scaffold_search(pe_seq, args.prime_editing_pegRNA_extension_seq, args.prime_editing_pegRNA_scaffold_seq, args.prime_editing_pegRNA_scaffold_min_match_length)