Skip to content

Commit

Permalink
Decrease Docker image size and fix PE naming and parameter behavior (#…
Browse files Browse the repository at this point in the history
…404)

* Fix 'Prime-edited' key not found (#32)

* Move 'Prime-edited' amplicon name check

By moving this, it will check if there is an amplicon named
'Prime-edited' (which is a reserved name) even if the
`prime_editing_pegRNA_extension_seq` parameter is empty.

* Only search for scaffold integration when pegRNA extension seq is provided

* Remove spaces at the end of lines

* Docker size (#49)

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* GitHub actions integration tests (#48)

* GitHub actions clean (#40)

* Create pytest.yml

* Create pylint.yml

* Create .pylintrc

* Create test_env.yml

* Full path

* Remove conda install

* Replace path

* Pytest tests

* pip -e

* Create integration_tests.yml

* Simplify name

* CRISPRESSO2_DIR environment variable

* Up one dir

* ls workspace

* Install CRISPResso and ydiff

* Clone repo instead of checkout

* submodule

* ls

* CRISPResso2_copy

* ls

* Update env

* Simplify

* Pull from githubactions branch

* Pull githubactions repo

* Checkout githubactions

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

* Run tests individually

* Pin plotly version

* Run all tests even if one fails

* Test on another branch

* Switch branch with token

* Update integration_tests.yml

* Introduce pandas sorting in CRISPRessoCompare (#47)

* New makefile commands

* Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42)

* Extract out split_interleaved_fastq function to CRISPRessoShared

* Implement splitting interleaved fastq files in CRISPRessoPooled

* Suppress split_interleaved_input from CRISPRessoWGS parameters

* Suppress other parameters in CRISPRessoWGS

* Move where interleaved fastq files are split to be trimmed properly

* Bug Fix - 367 (#35)

* - Fixed references to ref_names_for_pe

* removed extra tabs

* trying to match empty line, no tabs

* - changed references to ref_names[0]

* Mckay/pd warnings (#45)

* refactor errors='ignore' to try except

* refactored integer slice to iloc[]

* moved to_numeric try except to function

* Refactor to_numeric_ignore_errors to to_numeric_ignore_columns

This change is slightly cleaner because it addresses the root issue that some
columns are strings (and can therefore not be converted to numeric types). Now
if an error does occur when converting the dfs to numeric types it won't be
swallowed up.

* Add documentation to to_numeric_ignore_columns

---------

Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Cole Lyman <[email protected]>

* On push no branches

* On push no branches

* All in one file

* Fix yml errors

* Rename jobs

* Remove old workflow files

* Remove paths

* Run jobs in parallel

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

* 3.4->2.08

* Put ttf-mscorefonts-installer back above apt-get clean

* restore slash, replace fastp with trimmomatic and flash, add autoremove step

---------

Co-authored-by: mbowcut2 <[email protected]>
Co-authored-by: Cole Lyman <[email protected]>

---------

Co-authored-by: Samuel Nichols <[email protected]>
Co-authored-by: mbowcut2 <[email protected]>
  • Loading branch information
3 people authored Mar 28, 2024
1 parent b2cfb91 commit fa03d16
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 9 deletions.
14 changes: 7 additions & 7 deletions CRISPResso2/CRISPRessoCORE.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ def process_fastq(fastq_filename, variantCache, ref_names, refs, args):
aln_matrix = CRISPResso2Align.read_matrix(aln_matrix_loc)

pe_scaffold_dna_info = (0, None) #scaffold start loc, scaffold seq to search
if args.prime_editing_pegRNA_scaffold_seq != "":
if args.prime_editing_pegRNA_scaffold_seq != "" and args.prime_editing_pegRNA_extension_seq != "":
pe_scaffold_dna_info = get_pe_scaffold_search(refs['Prime-edited']['sequence'], args.prime_editing_pegRNA_extension_seq, args.prime_editing_pegRNA_scaffold_seq, args.prime_editing_pegRNA_scaffold_min_match_length)

not_aln = {} #cache for reads that don't align
Expand Down Expand Up @@ -555,7 +555,7 @@ def process_bam(bam_filename, bam_chr_loc, output_bam, variantCache, ref_names,
aln_matrix = CRISPResso2Align.read_matrix(aln_matrix_loc)

pe_scaffold_dna_info = (0, None) #scaffold start loc, scaffold sequence
if args.prime_editing_pegRNA_scaffold_seq != "":
if args.prime_editing_pegRNA_scaffold_seq != "" and args.prime_editing_pegRNA_extension_seq != "":
pe_scaffold_dna_info = get_pe_scaffold_search(refs['Prime-edited']['sequence'], args.prime_editing_pegRNA_extension_seq, args.prime_editing_pegRNA_scaffold_seq, args.prime_editing_pegRNA_scaffold_min_match_length)

not_aln = {} #cache for reads that don't align
Expand Down Expand Up @@ -694,7 +694,7 @@ def process_fastq_write_out(fastq_input, fastq_output, variantCache, ref_names,
aln_matrix = CRISPResso2Align.read_matrix(aln_matrix_loc)

pe_scaffold_dna_info = (0, None) #scaffold start loc, scaffold sequence
if args.prime_editing_pegRNA_scaffold_seq != "":
if args.prime_editing_pegRNA_scaffold_seq != "" and args.prime_editing_pegRNA_extension_seq != "":
pe_scaffold_dna_info = get_pe_scaffold_search(refs['Prime-edited']['sequence'], args.prime_editing_pegRNA_extension_seq, args.prime_editing_pegRNA_scaffold_seq, args.prime_editing_pegRNA_scaffold_min_match_length)
not_aln = {} #cache for reads that don't align
not_aln[''] = "" #add empty sequence to the not_aln in case the fastq has an extra newline at the end
Expand Down Expand Up @@ -823,7 +823,7 @@ def process_single_fastq_write_bam_out(fastq_input, bam_output, bam_header, vari
aln_matrix = CRISPResso2Align.read_matrix(aln_matrix_loc)

pe_scaffold_dna_info = (0, None) # scaffold start loc, scaffold sequence
if args.prime_editing_pegRNA_scaffold_seq != "":
if args.prime_editing_pegRNA_scaffold_seq != "" and args.prime_editing_pegRNA_extension_seq != "":
pe_scaffold_dna_info = get_pe_scaffold_search(refs['Prime-edited']['sequence'], args.prime_editing_pegRNA_extension_seq, args.prime_editing_pegRNA_scaffold_seq, args.prime_editing_pegRNA_scaffold_min_match_length)
not_aln = {} # cache for reads that don't align
not_aln[''] = "" # add empty sequence to the not_aln in case the fastq has an extra newline at the end
Expand Down Expand Up @@ -1428,6 +1428,8 @@ def rreplace(s, old, new):


#Prime editing
if 'Prime-edited' in amplicon_name_arr:
raise CRISPRessoShared.BadParameterException("An amplicon named 'Prime-edited' must not be provided.")
prime_editing_extension_seq_dna = "" #global var for the editing extension sequence for the scaffold quantification below
prime_editing_edited_amp_seq = ""
if args.prime_editing_pegRNA_extension_seq != "":
Expand Down Expand Up @@ -1489,8 +1491,6 @@ def rreplace(s, old, new):
if new_ref in amplicon_seq_arr:
raise CRISPRessoShared.BadParameterException('The calculated prime-edited amplicon is the same as the reference sequence.')
amplicon_seq_arr.append(new_ref)
if 'Prime-edited' in amplicon_name_arr:
raise CRISPRessoShared.BadParameterException("An amplicon named 'Prime-edited' must not be provided.")
amplicon_name_arr.append('Prime-edited')
amplicon_quant_window_coordinates_arr.append('')
prime_editing_edited_amp_seq = new_ref
Expand Down Expand Up @@ -2380,7 +2380,7 @@ def get_prime_editing_guides(this_amp_seq, this_amp_name, ref0_seq, prime_edited

info('Done!', {'percent_complete': 20})

if args.prime_editing_pegRNA_scaffold_seq != "":
if args.prime_editing_pegRNA_scaffold_seq != "" and args.prime_editing_pegRNA_extension_seq != "":
#introduce a new ref (that we didn't align to) called 'Scaffold Incorporated' -- copy it from the ref called 'prime-edited'
new_ref = deepcopy(refs['Prime-edited'])
new_ref['name'] = "Scaffold-incorporated"
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ MAINTAINER Kendell Clement
RUN apt-get update && apt-get install gcc g++ bowtie2 samtools libsys-hostname-long-perl \
-y --no-install-recommends \
&& apt-get clean \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /usr/share/man/* \
&& rm -rf /usr/share/doc/* \
&& conda install -c defaults -c conda-forge -c bioconda -y -n base --debug -c bioconda trimmomatic flash numpy cython jinja2 tbb=2020.2 pyparsing=2.3.1 scipy matplotlib pandas plotly\
&& conda install -c defaults -c conda-forge -c bioconda -y -n base --debug trimmomatic flash numpy cython jinja2 tbb=2020.2 pyparsing=2.3.1 scipy matplotlib-base pandas plotly\
&& conda clean --all --yes

#install ms fonts
Expand All @@ -40,4 +41,4 @@ RUN python setup.py install \
&& CRISPRessoCompare -h


ENTRYPOINT ["python","/CRISPResso2/CRISPResso2_router.py"]
ENTRYPOINT ["python","/CRISPResso2/CRISPResso2_router.py"]

0 comments on commit fa03d16

Please sign in to comment.