ZipFile: set allowZip64=True to write larger allele frequency tables #42

ronaldhause · 2020-05-04T05:50:45Z

Addresses terminating ERROR: Filesize would require ZIP64 extensions when trying to write compressed allele frequency tables > 2 GB

…soWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly

* GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]>

#403) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- * Extract out quantification window coordinate function * Refactor get_quant_window_coordinates function into two The rationale behind this is that the behavior around the cloned amplicon is quite different than if the qwc are specified directly for the amplicon. * Handling qwc: add unit tests, refactor some more and add documentation * Extract out get_relative_coordinates function This function just computes the relative indexes without doing an alignment. * Add clarifying unit tests for `get_relative_coordinates` * Refactor cloned indexes to use ref_positions instead of s1inds * fixed function for getting cloned qwc idxs * added tests for cloned qwc function * Introduce pandas sorting in CRISPRessoCompare (#47) * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- --------- * removed if check * implemented last test * changed NT to BadParameterException * changed tests, NT to BadParameter exceptions * Uncomment and correct tests for `get_relative_coordinates` * finished qwc tests * 0 is an acceptable qwc * new get_relative_coords function * added relative coordinate tests * removed unused functions * formatting * check for 0 qwc * remove test code * remove comment * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: McKay <[email protected]> Co-authored-by: Samuel Nichols <[email protected]>

…404) * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]>

* Change CRISPResso_status.txt format to JSON (#46) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * add json read for status file * changed Formatter to json format * fixed json access variable name: message * changed perentage_complete to numeric * changed status file to .json * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * New makefile commands * changed file to .json * changed status to json file * Make JSON human readable by adding new lines * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * point to test branch * pointed CI config to testing branch * Update integration_tests.yml point to master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Trevor/fastp integration (#50) * Update check_program to check versions and create check_fastq function * Update fastq arg, implement fastp in get_most_frequent_reads * Bump version to 2.3.0 * Deprecate Flash and Trimmomatic parameters, and update fastp params * Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap * Implement trimming of single end reads * Merge (and trim) reads in CRISPRessoCORE with fastp * Modify error handling to account for fastp errors * Replace flash and trimmomatic with fastp in Docker dependencies * Update LICENSE.txt with fastp info * Remove min and max amplicon length (no longer needed) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Implement trimming with fastp in CRISPRessoPooled * Implemend merging (and trimming) with fastp in CRISPRessoPooled * Fixed minor fastp errors * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Update where the test point to * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * initial readme modifications * Updated readme to remove deprecated commands, updated help text to reflect new version and fastp * Pointing test branch back at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> * Fix case sensitivity in Prime Editing mode (#54) * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Make all amplicons in amplicon_seq_arr uppercase This fixes https://github.com/pinellolab/CRISPResso2/issues/396 * Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: trevormartinj7 <[email protected]> * Batch d3 clean (#55) * imports C2Pro plots if available * added --use_matplotlib flag * added C2Pro matched api funciton signatures * added api args for plotly * added **kwargs * renamed config to custom_config, more specificity * added backend flag for plotly kaleido * added pro_installed boolean for templates, added plotly dependency to report templates * Squashed commit of the following: commit c909ea3b34e87ce637e00dac075d2bb2f8bfb954 Author: McKay <[email protected]> Date: Thu Feb 15 15:55:23 2024 -0700 added plotly dependency for pro commit 76b3601f6a0144f100266153f1c999e0c5de65de Author: Samuel Nichols <[email protected]> Date: Fri Jan 12 09:56:19 2024 -0700 Squashed commit of the following: commit 603f2eff9d1aa21ae95f3e134da303b8018d3a33 Author: Samuel Nichols <[email protected]> Date: Fri Jan 12 09:48:20 2024 -0700 fix guardrials partial commit 22fc03183a8070c30dfb74d5c23575ac19019855 Author: Samuel Nichols <[email protected]> Date: Fri Jan 12 08:54:01 2024 -0700 Add guardrail partial commit e55f6b21972b578261bc5a864ce1d653d98f9e34 Author: Samuel Nichols <[email protected]> Date: Mon Jan 8 07:50:59 2024 -0700 Functional guardrails, needs reports update commit 6e968e9699ed59a47d88191d03768e042d8b60a4 Merge: 32b49685 e948ce10 Author: Samuel Nichols <[email protected]> Date: Mon Dec 18 13:34:36 2023 -0700 Merge branch 'guardrails-clean-history' of https://github.com/edilytics/CRISPResso2 into guardrails-clean-history commit 32b49685da320501dad2b0ebbb57887b66220ba8 Author: Samuel Nichols <[email protected]> Date: Fri Dec 15 15:27:04 2023 -0700 Include guardrail functions commit 4e309cf6f732565d635de3d4c5d074ada3027e2d Author: Cole Lyman <[email protected]> Date: Mon Dec 18 10:51:55 2023 -0700 Refactor to use CRISPRessoReports module commit e648dc087c0055bc5d2fca13c64071a371dea941 Author: Cole Lyman <[email protected]> Date: Mon Dec 18 10:51:11 2023 -0700 Add CRISPRessoReports subtree commit e948ce107ebb0d1d99010ed12e937f34b5e607d4 Author: Samuel Nichols <[email protected]> Date: Fri Dec 15 15:27:04 2023 -0700 Include guardrail functions commit d33c748871a625facfe8d792e29c77ab9779138f Author: Kendell Clement <[email protected]> Date: Tue Nov 7 16:31:06 2023 -0700 Include parameter --assign_ambiguous_alignments_to_first_reference in readme commit a1435f7f491a6a61434f3051e39f39a4c9bf1edc Author: Kendell Clement <[email protected]> Date: Wed Oct 11 17:17:30 2023 -0600 Enable quantification by sgRNA (#348) This PR includes: - storing the sgRNA-specific editing locations in the crispresso2_info object. Previously, each amplicon would record the indices of quantification windows across the guide, but not for individual guides. This stores the information for each guide in crispresso2_info['results']['refs'][reference_name]['sgRNA_include_idxs'] - a script (count_sgRNA_specific_edits.py) to parse through an allele table output from a completed CRISPResso run (`--write_detailed_allele_table` flag required) to count edits in each sgRNA separately. I don't have a good double-edited sample handy, but it can be run on the demo HDR data [hdr.fastq.gz](http://crispresso.pinellolab.org/static/demo/hdr.fastq.gz) using the command: ``` CRISPResso -r1 hdr.fastq.gz -a acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -e acatttgcttctgacacaactgtgttcactagcaacctcaaacagacaccatggtgcaCctgactccGgaggagaagtctgccgttactgcGctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggttggtatcaaggtta -c atggtgcatctgactcctgTggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcag -g TGCACCATGGTGTCTGTTTG,GATGAAGTTGGTGGTGAGGCCC --write_detailed_allele_table -n hdr3 -p max -gn guide1,guide2 ``` ``` python CRISPResso2/scripts/count_sgRNA_specific_edits.py -f CRISPResso_on_hdr3 ``` This produces: ``` Processed 25000 alleles Reference: Reference (2391/23415 modified reads) UNMODIFIED: 21024 MODIFIED guide1: 2359 MODIFIED guide2: 32 Reference: HDR (856/1577 modified reads) UNMODIFIED: 721 MODIFIED guide1: 854 MODIFIED guide1 + guide2: 1 MODIFIED guide2: 1 ``` commit 2e3da02fdbed2fa8ae02a277763d65a502459827 Author: Cole Lyman <[email protected]> Date: Tue Oct 10 15:29:08 2023 -0600 changed tuple to list for matplotlib change (#31) (#346) Co-authored-by: mbowcut2 <[email protected]> commit cd3c332135fe4db0f9218e3d87263d5c65838ed9 Author: Kendell Clement <[email protected]> Date: Sun Oct 1 01:54:46 2023 -0600 rename script to camel case commit 7c719d65fb36ac7654db9040f226564ea28fcab9 Author: Kendell Clement <[email protected]> Date: Sun Oct 1 01:53:44 2023 -0600 Add new script for counting high quality bases commit f97cd2795e89464bcc9321ccfdbca3e6af2bcb4f Author: Kendell Clement <[email protected]> Date: Thu Sep 14 15:15:30 2023 -0600 Prime editing alignment params (#336) Adds two parameters to control alignment of pegRNA components: --prime_editing_gap_open_penalty and --prime_editing_gap_extend_penalty. CRISPResso checks to see whether the pegRNA spacer and extension sequence are in the correct orientation, but sometimes they could align in the incorrect orientation with a higher score (e.g. via insertion of multiple gaps, whereas a single long gap would be preferred). Introducing these two parameters allows users to adjust the alignment parameters specifically for these prime-editing checks without adjusting the global alignment parameters which will be applied to reads that are aligned to the WT reference/prime-editing reference sequences. The new prime_editing_gap_open_penalty is set to -50, a higher gap open penalty than the default needleman_wunsch_gap_open penalty (-20). This commit breaks backward-reproducibility, but mostly in the checking of pegRNA component orientation - so previously some CRISPResso runs would have failed and produced an error, but now they will (hopefully) succeed. To achieve complete backward reproducibility, add the flag --prime_editing_gap_open_penalty -20 to runs. commit 64cbf36dae85cffa2c15e73f2a7ee8aa1077d917 Author: Cole Lyman <[email protected]> Date: Thu Sep 7 16:43:30 2023 -0600 Fix samtools piping (#325) * Remove samtools pipe stderr to stdout Sometimes some of the libraries that samtools depends on don't have the correct version information, and as such samtools will report this to stderr when run. Because we pipe the output of samtools, we expect it to be valid SAM format, but when these library version messages are reported, it breaks CRISPRessoWGS. * Remove extra spacing at end of lines and add missing comma in WGS * Log stderr from samtools in CRISPRessoWGS commit 8feff4101f27406d9d88ace97d31a518276bff3f Author: Cole Lyman <[email protected]> Date: Fri Sep 1 09:43:56 2023 -0600 Replace link to CRISPResso schematic with raw URL in README (#329) * Replace link to CRISPResso schematic with raw URL * Add new lines to the beginning of unordered lists commit 2e9e6bff5bcc536d5e2ba1440d1ab96d9d47efd6 Author: Kendell Clement <[email protected]> Date: Thu Aug 10 00:52:12 2023 -0600 Try to unbreak CircleCI commit ae5b95246cb0f6d66c4cbfb50cf8f5a9626b0827 Author: Kendell Clement <[email protected]> Date: Thu Aug 10 00:17:27 2023 -0600 Center command line text messages commit 4d9c71ecf2248c9bb1e10430178dc318b6621c8b Author: Kendell Clement <[email protected]> Date: Thu Aug 10 00:17:07 2023 -0600 Fix bug in prime-editing scaffold-incorporation plotting If read is too short, scaffold incorporation detection will fail because it will check beyond the length of the read. commit 2b36a1a5c35e8a93516ce8baf464595615e0f402 Author: Kendell Clement <[email protected]> Date: Wed Aug 9 15:29:48 2023 -0600 CRISPRessoPooled --compile_postrun_references bug fixes commit 3e04d1d402bcf95edd39fc7c8c9af61bb380f9db Author: Kendell Clement <[email protected]> Date: Tue Aug 8 23:30:15 2023 -0600 Fix missing ' in Pooled --demultiplex_only_at_amplicons commit 06af527f9e2020c5cf251e7f1cec0b1eca1c1664 Author: Cole Lyman <[email protected]> Date: Mon Jul 24 10:47:46 2023 -0600 Sort pandas dataframes by # of reads and sequences so that the order is consistent (#316) * Make sorting stable * Including c files * Sort by #Reads instead of %Reads to avoid floating point errors --------- Co-authored-by: Samuel Nichols <[email protected]> commit de05533b3511a84f3b6b14fc2ef64db041613261 Author: Cole Lyman <[email protected]> Date: Thu Jul 6 13:54:45 2023 -0600 Fix multiprocessing lambda pickling (#311) * Fix running plots in parallel The reason the plots were running slower before this change is because I was calling the plot function, not passing it to `submit`. So it was essentially running in serial, but worse because it was still spinning up/down the processes. * Fix multiprocessing lambda pickling (#20) * Refactor process_futures to be a dict This makes debugging much easier because you can associate the arguments to the future with the results. * Fix the pickling error when running in multiprocessing Only top-level functions (not lambdas) can be pickled to use in multiprocessing pools, thus the lambdas are converted to a regular function. * Further fixes to pickling multiprocessing error (#21) * Refactor process_futures to be a dict This makes debugging much easier because you can associate the arguments to the future with the results. * Fix the pickling error when running in multiprocessing Only top-level functions (not lambdas) can be pickled to use in multiprocessing pools, thus the lambdas are converted to a regular function. * Use Counter instead of defaultdict in CRISPRessoCORE * Update process_futures to dict in Batch and Aggregate commit ebb016dff46c280dce8c3c09e8ac0e0cc25d4d74 Author: Kendell Clement <[email protected]> Date: Mon Jul 3 17:12:09 2023 -0600 Enable CRISPRessoPooled multiprocessing when os allows multi-thread file append commit 7285da0e987b77b72c8885bb35940e0f50c146bd Author: Kendell Clement <[email protected]> Date: Fri Jun 23 16:50:33 2023 -0600 Fix print bug for invalid fastq commit 9acdeac67441f9a1d55ac94b153bcb68fb89b92c Author: kclem <[email protected]> Date: Wed Jun 21 16:03:48 2023 -0600 Slugify before creating filename - replaces invalid characters in batch names with _ commit f97e29c67de4c80b8d6b9cf334f363be4b514ade Author: Cole Lyman <[email protected]> Date: Wed Jun 21 14:43:43 2023 -0600 Add verbosity argument to CRISPRessoAggregate (#18) fixes #306 (#307) * Add verbosity argument to CRISPRessoAggregate (#18) * Allow for amplicon and guide seqs to be some variant of NA in batch (#19) This was discovered when attempting to infer amplicon sequences in batch mode on the web interface, NAs were supplied for the amplicon sequences to the sub CRISPResso commands. commit 32e1e9797da5c3033cdc588e92f06b8813961953 Author: Mark Clement <[email protected]> Date: Wed Jun 21 14:01:00 2023 -0600 Allow for interrogation of overlapping sgRNA sites commit 7248ba8c4deee125ad1ec12fdf1294a84d5f6f93 Author: Kendell Clement <[email protected]> Date: Mon Jun 12 12:16:47 2023 -0600 Check input fastq file format Asserts input format of fastq files - including if gzipped files are missing the gz suffix. commit 83c8ab8f462e7d8c1d04c08c1a398b874f517251 Author: Kendell Clement <[email protected]> Date: Mon Jun 5 13:41:55 2023 -0600 Fix CRISPRessoArgParser commit 14a2c8577f566e1b72d5f4e72cd6cd22079610be Author: Kendell Clement <[email protected]> Date: Mon Jun 5 13:29:31 2023 -0600 Cosmetic updates for command-line use - version bump to 2.2.13 - If no args are provided, the command line version will print out an abbreviated help message - parameters can be excluded from CRISPRessoArgParser commit 1cd54bc1d03360c3d8121ba9e66b3589fe1cf252 Author: Cole Lyman <[email protected]> Date: Thu May 11 14:31:47 2023 -0600 Fix multiprocessing error, don't start pool when only using single thread (#302) * Update README to have consistent use of `--base_editor_output` (#16) * Add files via upload * Only start process pools when using multiple processes This is mainly to solve the issue when running on AWS Lambda, but this should improve single core performance overall. --------- Co-authored-by: Kendell Clement <[email protected]> commit 92a705c939b370373a70cf6ae9f1616de33288b9 Author: Cole Lyman <[email protected]> Date: Thu May 11 14:31:06 2023 -0600 Update `base_editor` parameters in README and add Plot Harness (#301) * Update README to have consistent use of `--base_editor_output` (#16) * Add files via upload --------- Co-authored-by: Kendell Clement <[email protected]> commit 7d46c4490235df45c5546b1b470e4e6a99727031 Author: Cole Lyman <[email protected]> Date: Wed May 10 15:41:33 2023 -0600 Clarify CRISPRessoWGS intended use (#303) * Update README to have consistent use of `--base_editor_output` (#16) * Add sample plotting jupyter notebook * Add clarifying info to CRISPRessoWGS description Clarify WGS usage commit 833a701787bb47674b3e921c38cac6189c775cf7 Author: Kendell Clement <[email protected]> Date: Thu May 4 17:02:46 2023 -0400 Remove debug print statements commit 712eb2a11825e8d36f2870deb12b35486bd633fb Author: Kendell Clement <[email protected]> Date: Thu May 4 16:40:07 2023 -0400 Allow dashes in filenames resolve #73 commit a439f094745b2b5e7f032f0777d4c67e6d6f93c5 Author: Kendell Clement <[email protected]> Date: Sat Apr 22 23:41:58 2023 -0400 Raise exceptions from within futures in plot_pool commit 7e807a60de2a9d18bccd034b87106ceaf7153338 Author: Kendell Clement <[email protected]> Date: Sat Apr 22 23:38:56 2023 -0400 Fix future pandas indexing warning Pandas error was "FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead" commit 304a92aa7a7ef8c705cb070dce25d9a2e5745ba9 Author: Cole Lyman <[email protected]> Date: Thu Apr 20 13:59:27 2023 -0600 Remove debug print statements fixes #295 (#297) The format string option used here is only available in Python version >=3.8. commit 478c06f784603e96d20f96e91993fdcc4ac35c8a Author: Kendell Clement <[email protected]> Date: Thu Apr 13 12:09:26 2023 -0400 Update plotCustomAllelePlot.py script for #292 (#293) Update type of 'max_rows' param to int Fix location of 'args' in crispresso2_info object commit bcdae39e05d530f4a4e78738c3b30f7664981919 Author: Kendell Clement <[email protected]> Date: Mon Mar 27 13:18:34 2023 -0400 Update pooled parameter format commit 546446e36e7e68b527767d6c31ec341a49df2059 Author: Kendell Clement <[email protected]> Date: Tue Feb 14 16:26:23 2023 -0500 Fix running plots in parallel (#286) The reason the plots were running slower before this change is because I was calling the plot function, not passing it to `submit`. So it was essentially running in serial, but worse because it was still spinning up/down the processes. Co-authored-by: Cole Lyman <[email protected]> commit d75f32a2eb5aeaaee866c09e5655a3e27af8b1a1 Author: kclem <[email protected]> Date: Fri Feb 10 15:45:15 2023 -0500 Fix #283 to avoid filename collisions Previously, amplicon names longer than 21bp were truncated, but the check for uniqueness wasn't working, so it would overwrite some plot files. This fixes the filename collision and enforces uniqueness in reference filename prefixes. Thanks @mbiokyle29 commit e577318006cd17b2725bd028e5e56634c6eb829a Author: kclem <[email protected]> Date: Mon Feb 6 16:37:25 2023 -0500 Case-insensitive headers accepted in CRISPRessoPooled commit d34927620a4a6126a9988b3041e76f60728abbfe Author: Kendell Clement <[email protected]> Date: Tue Jan 31 13:48:33 2023 -0500 Fix print statement in CORE commit ee88b7ed89c395f68225a50dea44a2ad69d5e9a5 Author: Kendell Clement <[email protected]> Date: Tue Jan 31 13:22:51 2023 -0500 Version bump to 2.2.12 commit 1d4679c72d0c8b4154317c9aff5179217198e2d7 Author: Kendell Clement <[email protected]> Date: Tue Jan 31 13:01:31 2023 -0500 Status Updates + Pooled Mixed Mode Update (#279) * Implement logging handler to overwrite the latest log status to file * Add StatusHandler to CRISPRessoCORE log This will take the latest log output and write it to a file (`status.txt`), the catch being that with each log the file is overwritten so that one can easily tell where CRISPResso currently is and what the error is (if any). These changes include some slight refactoring in order to accomodate any potential parameter exceptions. * Add StatusHandler to CRISPRessoBatch and refactor `logger.warn` to `warn` * Add StatusHandler to CRISPRessoPooled and a little refactoring * Implement `percent_complete` to the status log * Add StatusHandler to CRISPRessoAggregate log * Add StatusHandler to CRISPRessoCompare log * Add StatusHandler to CRISPRessoPooledWGSCompare log * Add StatusHandler to CRISPRessoWGS log * Rename `status.txt` to `CRISPResso_status.txt` * Modify status log names to match the tool they are generated from * Add percent_complete stages to CRISPRessoCORE These also include log statements of each plot that is being generated as well as fixing some variable name collisions with `ind`. * Format the percentage in the log to be 2 decimal places * Change all plotting logs from `info` to `debug` and simplify progress This refactors how the progress of the plots is calculated, making it much simplier. Before this change we would of had to keep track of the number of times `percent_complete` was output, but now it simply updates the percent complete after each amplicon is finished processing. Hopefully this will make things easier to mantain even though it will be a little less "accurate" (not sure how accurate the original implementation was...). * Implemented shared console log handler across all CRISPResso* calls This allows for easy changes to logging formatting, which was inspired by having to change the default logging level. The default logging level needs to be set at `logging.DEBUG` in order for the debug log statements to not be ignored for the running and status logs. * Add ability to set the verbosity level to each CRISPResso* tool This allows users to set a verbosity level between 1 and 4 using the `-v`/`--verbosity` CLI parameter. If the `--debug` flag is present, then the level will default to 4, being the most verbose. * Implement showing the last seen `percent_compelte` when none is provided * Keep track of and log when multiple parallel runs are completed These changes modify `CRISPRessoMultiProcessing.run_crispresso_cmds` such that we can now display when a run is completed. This potentially breaks how signals and interupts are handled with multiple runs happening, but this needs to be reviewed. * Add debug and percentage complete to CRISPRessoBatch * Add percent complete to CRISPRessoPooled * Add debug and percent_complete message to CRISPRessoAggregate * Add `percent_complete` to CRISPRessoCompare * Add `percent_complete` to CRISPRessoPooledWGSCompare * Add status and `percent_complete` to CRISPRessoMeta * Add `verbosity` arguments to CRISPRessoCompare and CRISPRessoPooledWGSCompare * Fixing documentation to match pooled headers * Header removal bug fix change documentation to guide_seq * Update documentation and help feature for CRISPRessoPooled * Remove extra newlines from CRISPRessoPooled -h * Make variable names as clear as my firstborn child's name * Update one more variable name * Fix bug to flow CRISPRessoPooled options to sub command * Make amplicon file args variable name clear * Update how parameters are set and retrieved from parameter object The refactor in the previous commit changed the type of the arguments to a dictionary which doesn't have the parameters as attributes, and this commit fixes that error. * Add note in output header for change in default CRISPRessoPooled In the next release (2.3.0) the `--demultiplex_only_at_amplicons` will be the default when running in mixed-mode. This is to allow for inexact alignments of the reads and the amplicons to the genome. For more context, see this issue https://github.com/pinellolab/CRISPResso2/issues/276 * Clarify the verbosity parameter help message * Separate out parameters to `normalize_name` in CRISPRessoCORE * Separate out parameters to `normalize_name` in CRISPRessoWGS * Separate out parameters to `normalize_name` in CRISPRessoPooled * Separate out parameters to `normalize_name` in CRISPRessoCompare * Fix bug in CRISPRessoPooled by replacing `database_id` with `normalize_name` * Refactor `run_crispresso_cmds` to not require a `logger` This commit implements the functionality to make the `logger` object optional by seeing which module called the `run_crispresso_cmds` function and obtaining the correct object from that module name. The function also immediately returns when no commands are passed to it. * Add amplicon name to plotting debug statements in CRISPRessoCORE --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> commit ff7eca76e6a3a08af4ac18ac4e88d20f2a06b1f9 Author: Kendell Clement <[email protected]> Date: Thu Jan 26 15:27:27 2023 -0500 CRISPRessoPooled custom header fix (#278) * Fixing documentation to match pooled headers * Header removal bug fix change documentation to guide_seq * Update documentation and help feature for CRISPRessoPooled * Remove extra newlines from CRISPRessoPooled -h * Make variable names as clear as my firstborn child's name * Update one more variable name Co-authored-by: Samuel Nichols <[email protected]> commit 104866e1080c973bb025d1a5ba59b19dca1658af Author: Cole Lyman <[email protected]> Date: Thu Jan 5 14:00:26 2023 -0700 Fix deprecated numpy type names (fixes #269) (#270) In the most recent version of numpy (1.24) some of the types have been deprecated. This commit fixes these errors. commit 58a8e42df88b66fad6b4f6ad04a5b9d9d43d01b4 Author: Cole Lyman <[email protected]> Date: Thu Jan 5 06:49:35 2023 -0700 Add snippet about installing CRISPResso2 via bioconda on Apple silicon (#274) I have suffered enough trying to debug my installation, so hopefully this helps someone else. Co-authored-by: Cole Lyman <[email protected]> commit b9851e98104602eb78c2b384105267624295e9d3 Author: Cole Lyman <[email protected]> Date: Thu Dec 22 13:30:23 2022 -0700 Fix bug when pooled bam is input (#265) This change checks to see if a bam file was input, and if so it doesn't try to remove any intermediate files because there aren't any. Co-authored-by: Cole Lyman <[email protected]> commit b822612642043e75a19042941f69b457ce51f517 Author: Kendell Clement <[email protected]> Date: Mon Dec 19 15:26:45 2022 -0500 Delete vscode settings commit b99aa624dec68ef7d19264340ce0cafa829625f4 Author: Kendell Clement <[email protected]> Date: Mon Dec 19 13:29:14 2022 -0500 Clarify input param help for pooled bam commit 3fae1e8b821ec6b1890bff6561fa8fa67dc49a04 Author: Kendell Clement <[email protected]> Date: Mon Dec 19 13:28:54 2022 -0500 Fix #235 - Cigar string is * if read unaligned Previously, the bam would set the cigar string to 0 if the read was unaligned. This breaks the sam->bam conversion and causes the errors in #235. commit c65ba07dc5a983453cdf7bb1e27005230dac6f1b Author: Cole Lyman <[email protected]> Date: Thu Dec 8 13:48:17 2022 -0700 Add deprecation notice (#260) * Add FLASh and Trimmomatic deprecation notice to CLI output * Add Edilytics email address to CLI output commit 2a30e5a45f5350ee7c6435bce1cd4edc4d31668a Author: Kendell Clement <[email protected]> Date: Tue Dec 6 12:16:19 2022 -0500 Format filterReadsOnSequencePresence script commit 9d764414edd88a46ad5e4f496e4f1c8d5d60ce3e Author: Kendell Clement <[email protected]> Date: Fri Dec 2 22:12:54 2022 -0500 Clarify default CRISPRessoPooled settings for use_legacy_bowtie2_options_string commit 9ddea40f7f02b546941ddaa4c71fc5283075051a Author: kclem <[email protected]> Date: Mon Nov 14 10:33:04 2022 -0500 Add check for prime editing extension sequence in prime edited sequence if the user specifies the prime_editing_override_prime_edited_ref_seq, it could not contain the extension seq (if they don't provide the extension seq in the appropriate orientation), so check that here. Extension sequence should be provided reverse-complement to the prime edited sequence. commit 152f2dd5001da7090641ee8a1326bde9f7e8104e Author: kclem <[email protected]> Date: Wed Nov 9 11:53:41 2022 -0500 Version bump to 2.2.11a commit 9ed356e3a0c6c316d0860d121772f80ddca6de1d Author: kclem <[email protected]> Date: Wed Nov 9 11:47:30 2022 -0500 Add param to override prime editing sequence checks CRISPResso checks that prime editing guides are provided in the proper orientation (e.g. pegRNA 3'->5', spacer sequence 5'->3') and checks these orientations by alignment. Sometimes, the alignment can be better in the opposite direction, and this parameter allows these checks to be overridden. Otherwise, these checks would halt the program and produce the output 'The prime editing pegRNA spacer sequence appears to be given in the 3\'->5\' order. The prime editing pegRNA spacer sequence (--prime_editing_pegRNA_spacer_seq) must be given in the RNA 5\'->3\' order.' commit 39dd80afb98a22b7edb6f801c363d86bb77eeb5b Author: kclem <[email protected]> Date: Wed Nov 9 10:06:51 2022 -0500 Update filterReadsOnSequencePresence.py commit fe55526927e3fb6e17c9a8a6f59c7057bc1e14eb Author: Kendell Clement <[email protected]> Date: Mon Nov 7 22:25:16 2022 -0500 Add script to filter input based on sequence presence commit 713e57a19c35180035ca35e11a5820065eda0198 Author: Kendell Clement <[email protected]> Date: Tue Oct 18 16:02:26 2022 -0400 Allow spaces in read names for CRISPRessoWGS commit 39ce008bdddccdd8229c0ba185dce78bc2f66968 Author: Cole Lyman <[email protected]> Date: Sat Oct 8 21:09:58 2022 -0600 Fix typo of CRISPResssoPlot when plotting nucleotide quilt (#250) commit 6a2b342c8503b7327c0a2414edfbd16912d60ca5 Author: Kendell Clement <[email protected]> Date: Sat Oct 8 23:08:47 2022 -0400 Batch amplicon plots (#251) * Error out if HDR amplicon matches existing amplicon * Add check for amplicon sequence uniqueness * Fix bug with bam_input not having bam_output * Test for no returned lines in auto mode, version bump to 2.2.11 * Fix pandas deprecation of df.append commit 726b2b93d6e419a1b0aa6a968c97edc55b4cc5a8 Author: Kendell Clement <[email protected]> Date: Thu Oct 6 16:32:02 2022 -0400 Fix CRISPRessoBatch plot pool bug when plots are suppressed commit 7e5049c4dfb88cbc87c91935a91d1f51120a10c2 Author: Cole Lyman <[email protected]> Date: Wed Sep 21 21:04:51 2022 -0600 Fix batch quilt plot name (#249) This fixes an incorrectly named allele quilt plot input in CRISPRessoBatch. commit 1821ca5029c5a1485733f13ab3f2048b4f1fa04e Author: Kendell Clement <[email protected]> Date: Thu Sep 15 15:49:08 2022 -0400 Version bump to 2.2.10 commit c5f79aebfc1ae209f4ee320df250eed89a02787c Author: Cole Lyman <[email protected]> Date: Wed Sep 14 14:24:55 2022 -0600 Parallel plot refactor (#247) * Fix duplicate plotting in CRISPRessoBatch aggregate * Refactor mulltiprocessing plots in CRISPRessoBatch * Refactor multiprocessing plots in CRISPRessoCORE * Refactor multiprocessing plots for CRISPRessoAggregate commit 4ed5e24e6cc1dd8068e2391573ae2438acd32db2 Author: Kendell Clement <[email protected]> Date: Tue Sep 13 14:12:11 2022 -0400 print files in curr dir if Aggregate can't find files commit ce25bc06f29988e7a10afd0b6a09ba0caf0950e0 Author: Kendell Clement <[email protected]> Date: Mon Sep 12 10:32:57 2022 -0400 Spelling typo commit c15f01c75083403f17c58c121b2afe97e9f2a1ec Author: Kendell Clement <[email protected]> Date: Tue Sep 6 17:49:52 2022 -0400 Add helper function to create alignment scoring matrix New scoring matrix can be created using CRISPResso2Align.make_matrix() commit c80f82838c5a228b79ad4484092877cfee08e02c Author: Cole Lyman <[email protected]> Date: Mon Aug 22 18:28:33 2022 -0600 Add `zip_output` (#240) * Making zip of results * Zip command added, if zip is true place_report_in_output_folder is also true, zip removes all files while zipping * Adding --zip to compare and pooled/wgs compare * Add more formatting changes to CRISPRessoShared * Refactoring propagate_crispress_options so only one version exists * Zip added to arguments_to_ignore and warning added when changing arguments * Restore styling * Update README to include --zip * Rename --zip to --zip_output * Change --zip to --zip_output in CompareCORE and PooledWGSCompareCORE * Bug fix arg to args Co-authored-by: Samuel Nichols <[email protected]> commit 5de3d7286d8e33c7cf4d3615fce715806e72f511 Author: Kendell Clement <[email protected]> Date: Thu Aug 11 21:42:34 2022 -0400 Fix fix to aggregate for CRISPRessoWGS commit a2294c266f43b14969a5d6474076f31a77a57173 Author: Kendell Clement <[email protected]> Date: Thu Aug 11 21:40:50 2022 -0400 Fix bug in aggregate for WGS commit 7ce3eb4abe4b8ceac933272ac9cb16a8bedf26a3 Author: Kendell Clement <[email protected]> Date: Mon Aug 8 21:53:45 2022 -0400 Update CRISPRessoWGS to allow non-word characters in region names commit 040ac0033d6e250f4e3a412101874cf5e914e08a Author: kclem <[email protected]> Date: Mon Aug 8 16:04:59 2022 -0400 Enable processing of cram files by CRISPRessoWGS Adds --reference to samtools view when viewing cram files commit cf112a0caba8789e28530cc09171285ec6ea9b4c Author: kclem <[email protected]> Date: Mon Aug 8 14:55:46 2022 -0400 Auto amplicon detection for interleaved input Enables processing of interleaved fastq files for guess_guides and guess_amplicons, as well as get_most_frequent_reads. When interleaved input is present, the input is first separated into R1/R2 files, then processing is performed. commit 4ba524dc7b947feca8a0f743837844f9febc2171 Author: Cole Lyman <[email protected]> Date: Thu Aug 4 11:32:11 2022 -0600 Potential fix for aggregate plots in Batch mode (#237) commit 6097a8a104d3f156ef7c08e196ac37e32bf04c71 Author: Kendell Clement <[email protected]> Date: Thu Jul 21 22:45:48 2022 -0400 Fix pct_vectors in crispresso2_info json object commit 65a079d86d6f386793397398f839c46014b54543 Author: Kendell Clement <[email protected]> Date: Wed Jul 20 23:46:37 2022 -0400 Fix more readme spelling bugs commit e817376ecd54cdea1f29e303ca25b9e7d1d38333 Author: Kendell Clement <[email protected]> Date: Wed Jul 20 23:42:23 2022 -0400 Fix bug in readme spelling commit 49740ba1d66ed6d13a9e154b8b17bc8b5186581d Author: Kendell Clement <[email protected]> Date: Wed Jul 20 16:10:09 2022 -0400 Fix loading of crispresso info from WGS and Pooled commit b68a43271115251b18e8955e285ccc18f549e8cd Author: Kendell Clement <[email protected]> Date: Thu Jul 14 14:11:04 2022 -0400 Add plotly to dockerfile commit b0b7d41d697304d0d5fc93e3346c9de1b98ba41d Author: Kendell Clement <[email protected]> Date: Thu Jul 14 14:10:00 2022 -0400 Fix #231 Allow N's in bam output (Try 2) commit c460b3e73fd06a230dbac2e37c86b833144ebf94 Author: Kendell Clement <[email protected]> Date: Thu Jul 14 14:09:10 2022 -0400 Revert "Fix #231 Allow N's in bam output" This reverts commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3. commit 2f6ad1dbe05210af9ccc1b1f17783cd212a888d3 Author: Kendell Clement <[email protected]> Date: Thu Jul 14 13:52:37 2022 -0400 Fix #231 Allow N's in bam output commit 0a2419e518dc9b3520058c3927f98b31cd51347e Author: Cole Lyman <[email protected]> Date: Fri Jul 8 21:10:01 2022 -0600 Fix bug when name is provided instead of amplicon_name in pooled input file (#229) Also, raise an exception (instead of incorrectly executing) when there are not enough matched parameters in the pooled input file. commit cb58212379803788c04ca5793baaa760cbbeaa81 Author: Cole Lyman <[email protected]> Date: Fri Jul 8 21:09:49 2022 -0600 Fix bug…

* Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- * Squashed commit of the following: commit 6ec98a05ee70f85b5aa0ac15ab6094b7f1f20d08 Author: mbowcut2 <[email protected]> Date: Tue Aug 13 16:44:39 2024 -0600 dict key changes commit 7cfd5acf06da4eb6f49453144ee1fed1e1488a7a Author: mbowcut2 <[email protected]> Date: Thu Aug 8 15:30:31 2024 -0600 added C2PRO install check back commit bfb0003329ea61b5c79c7e1df8d9a73ec5a508db Author: mbowcut2 <[email protected]> Date: Fri Aug 2 13:08:12 2024 -0600 fixed key error conditionals commit 84444e7480605206cb3efa4a0db675c55e717304 Author: mbowcut2 <[email protected]> Date: Fri Aug 2 09:22:44 2024 -0600 use local jinja_paritals file commit 71dd12786fec6c4aba0170a3bfd9022b06f5eede Author: mbowcut2 <[email protected]> Date: Wed Jul 31 14:10:29 2024 -0600 Squashed commit of the following: commit 5e3b30515c4bc437127e7fb21f53cb0bd511c4ca Author: Trevor Martin <[email protected]> Date: Mon Jul 22 09:31:44 2024 -0600 D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> commit 09e5d9720ad21e44fc7916d71bde3fd7a9dfa7ef Author: Kendell Clement <[email protected]> Date: Thu Jul 18 14:31:54 2024 -0600 Asymmetrical cut point (#457) * add cut_point_ind to plot_alleles_heatmap for asymmetrical plotting * Cole asymmetrical cut point (#453) * Pin versions of numpy and matplotlib in CI environment (#84) (#452) * Reduce duplication and implement cut_point_ind in plot_alleles_heatmap_hist --------- Co-authored-by: Cole Lyman <[email protected]> commit 8d92972694ddff629dad844a6ad100459f69751d Author: Cole Lyman <[email protected]> Date: Thu Jul 18 14:29:40 2024 -0600 Cole/update args (#85) (#456) commit 44f692ecabf5e2eb96ee0cfd7bae62343da7810c Author: Cole Lyman <[email protected]> Date: Mon Jul 15 16:17:29 2024 -0600 Implement new pooled mixed-mode default behavior (#454) * changes for pooled mixed-mode default (#83) * changes for pooled mixed-mode default * deprecated old arg * added integration tests for mixed mode * fixed test target * updated test name * pinned numpy * Fix integration tests yml * pinning matplotlib * added print to CI tests * changed mixed mode info string * Remove pooled-mixed-mode-align-to-genome step from Github Actions * Update demultiplex_genome_wide parameter and help * Convert args.json to unix line endings * Add Pooled mixed mode demux run * Update the name of the argument in Pooled * Point integration tests back to master --------- Co-authored-by: Cole Lyman <[email protected]> * Revert change to pooled mixed mode info statement (#86) --------- Co-authored-by: mbowcut2 <[email protected]> commit 79b482b55a0e8edbc03ec22bd2714bade1e90323 Author: Cole Lyman <[email protected]> Date: Tue Jul 9 12:53:23 2024 -0600 Pin versions of numpy and matplotlib in CI environment (#84) (#452) commit 80dc1bdd72d50f989717bfc5f8156bc3495c45f4 Author: Kendell Clement <[email protected]> Date: Thu May 30 14:07:42 2024 -0600 Add padding to image commit 381755daf0939aaf2745df0a802c809633aff47d Author: Kendell Clement <[email protected]> Date: Thu May 30 13:59:57 2024 -0600 White background for schematic for dark mode commit d649db71e610bd8840fbb8d46fadb07789b67390 Author: Cole Lyman <[email protected]> Date: Fri May 24 12:45:53 2024 -0600 Fix typo and move flexiguide to debug (#77) (#438) * Change flexiguide output to debug level * Fix typo in fastp merged output file name commit 71181f50ef2b39015523b1a71d9fd1bf0dce14eb Author: Cole Lyman <[email protected]> Date: Mon May 13 13:34:00 2024 -0600 Prefix the release Docker tag with a `v` (#434) commit d2c2be18a6bb64b0e742cc24c4665980a24324bc Author: Cole Lyman <[email protected]> Date: Mon May 13 09:41:32 2024 -0600 Showing sgRNA sequences on hover in CRISPRessoPro (#432) * Passing sgRNA sequences to regular and Batch D3 plots (#73) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Update integration_tests.yml to point back at master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Push new releases to ECR (#74) * Create aws_ecr.yml (#1) * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * us-east-1 * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Fix d3 sgRNA sequences (#76) * Pass correct sgRNA_sequences to d3 plot * Pass correct sgRNA sequence to prime editor plot for d3 * Resize plotly (#75) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Pass div id for plotly * Remove debug --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit 1c504274818b6b17fb60620d48fd92cb2e50566d Author: Cole Lyman <[email protected]> Date: Thu May 9 14:16:25 2024 -0600 Fix plots and improve plot error handling (#431) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit acb2ea8e26dff4cd11f71301b344f81b1cec9040 Author: Kendell Clement <[email protected]> Date: Thu May 2 13:49:33 2024 -0600 Use recent docker image for CircleCI testing that includes updated pandas commit 38fd76dbd7ce2087468f9f454b548777de959a68 Author: Cole Lyman <[email protected]> Date: Wed May 1 16:42:28 2024 -0600 Cole/fix status file name (#69) (#430) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> commit 3ec22e5fd09e432c9997d30e5f9ee51a2cc00d7b Author: Kendell Clement <[email protected]> Date: Wed May 1 13:08:11 2024 -0600 Remove linked space in readme commit 340a4e16795a5e500411e11572ec267525985009 Author: Cole Lyman <[email protected]> Date: Wed May 1 13:07:14 2024 -0600 Fix batch mode pandas warning. (#70) (#429) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: mbowcut2 <[email protected]> commit 1bc9e906f0ded81f80761d1ec375ee50a4f882a9 Author: Cole Lyman <[email protected]> Date: Fri Apr 26 16:26:27 2024 -0600 Bump version to 2.3.1 and change default CRISPRessoPooled behavior to change in 2.3.2 (#428) commit 5638a1f6ffa973231f23422e9c757fa8cd4af7cc Author: Kendell Clement <[email protected]> Date: Wed Apr 24 18:00:43 2024 -0600 Spelling fixes commit d6011f29db16d8fc1c1e7222457b7f9a1f671de6 Author: Cole Lyman <[email protected]> Date: Wed Apr 24 09:33:53 2024 -0600 Extract `jinja_partials` and fix CRISPRessoPooled fastp errors (#425) * Updated README (#64) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Cole Lyman <[email protected]> * Extract jinja_partials (#65) * Extract jinja_partials code * Remove Plotly dependency from setup.py * Fix CRISPRessoPooled flash errors (#68) * Fix replacing flash intermediate files with fastp intermediate files This also moves where the files are added to `files_to_remove` up to near where they are created. * Update to run test branch with paired end Pooled test * Add pooled-paired-sim test to integration tests * Replace flash and trimmomatic with fastp and remove plotly from Github Actions environment * Change test branch back to master --------- Co-authored-by: Trevor Martin <[email protected]> commit f4858a30c43374f54058b3ad9c1e965e1ab7fb46 Author: Cole Lyman <[email protected]> Date: Tue Apr 23 17:00:28 2024 -0600 Updated README (#64) (#424) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Trevor Martin <[email protected]> commit c3dbff0fccd44b0b1a9c246dd2aa629ddc515787 Author: Kendell Clement <[email protected]> Date: Mon Apr 22 11:24:59 2024 -0600 Update CRISPRessoPooledCORE.py (#423) Fix bug in error reporting if duplicate names are present commit 20903c14877e5166b1b8a7b50b8fcab450ea3ca6 Author: Cole Lyman <[email protected]> Date: Thu Apr 18 16:55:39 2024 -0600 Remove extra imports from CRISPRessoCore (#67) (#422) commit 4aae57e5be475cd717792265bee36a71a99425de Author: Cole Lyman <[email protected]> Date: Thu Apr 18 10:00:19 2024 -0600 Cole/refactor jinja undefined (#66) (#421) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Refactor logging Jinja2 undefined variable warnings * Revert plot_11a update * Update intedration test branch * Update jinja to warn on undefined but not fail. Fix all undefined warnings * Fix github integration tests ref * One more undefined variable --------- Co-authored-by: Samuel Nichols <[email protected]> commit 768c3c05bf1786a2a32e135b6e145cd6503c3db1 Author: Cole Lyman <[email protected]> Date: Tue Apr 9 17:30:10 2024 -0600 Fix Jinja2 undefined variables (#63) (#417) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Revert plot_11a update * Update intedration test branch * Update branch for integration tests commit 7e18f08cc1ac5f247a0fd1bbb394ccd9b0a07c2e Author: Han Dai <[email protected]> Date: Fri Apr 5 18:36:41 2024 -0400 fix: change all U+00A0 to U+0020 (#400) commit 235dc29c0cd0fcca2e999148d4660acf00b07221 Author: Cole Lyman <[email protected]> Date: Fri Apr 5 16:36:16 2024 -0600 Fastp, args as data, guardrails, and PE fix (#415) * Change CRISPResso_status.txt format to JSON (#46) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * add json read for status file * changed Formatter to json format * fixed json access variable name: message * changed perentage_complete to numeric * changed status file to .json * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * New makefile commands * changed file to .json * changed status to json file * Make JSON human readable by adding new lines * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * point to test branch * pointed CI config to testing branch * Update integration_tests.yml point to master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Trevor/fastp integration (#50) * Update check_program to check versions and create check_fastq function * Update fastq arg, implement fastp in get_most_frequent_reads * Bump version to 2.3.0 * Deprecate Flash and Trimmomatic parameters, and update fastp params * Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap * Implement trimming of single end reads * Merge (and trim) reads in CRISPRessoCORE with fastp * Modify error handling to account for fastp errors * Replace flash and trimmomatic with fastp in Docker dependencies * Update LICENSE.txt with fastp info * Remove min and max amplicon length (no longer needed) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Implement trimming with fastp in CRISPRessoPooled * Implemend merging (and trimming) with fastp in CRISPRessoPooled * Fixed minor fastp errors * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Update where the test point to * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * initial readme modifications * Updated readme to remove deprecated commands, updated help text to reflect new version and fastp * Pointing test branch back at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> * Fix case sensitivity in Prime Editing mode (#54) * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Make all amplicons in amplicon_seq_arr uppercase This fixes https://github.com/pinellolab/CRISPResso2/issues/396 * Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns ----… Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]>

* Mckay/c2pro reports test (#99) * Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- * Squashed commit of the following: commit 6ec98a05ee70f85b5aa0ac15ab6094b7f1f20d08 Author: mbowcut2 <[email protected]> Date: Tue Aug 13 16:44:39 2024 -0600 dict key changes commit 7cfd5acf06da4eb6f49453144ee1fed1e1488a7a Author: mbowcut2 <[email protected]> Date: Thu Aug 8 15:30:31 2024 -0600 added C2PRO install check back commit bfb0003329ea61b5c79c7e1df8d9a73ec5a508db Author: mbowcut2 <[email protected]> Date: Fri Aug 2 13:08:12 2024 -0600 fixed key error conditionals commit 84444e7480605206cb3efa4a0db675c55e717304 Author: mbowcut2 <[email protected]> Date: Fri Aug 2 09:22:44 2024 -0600 use local jinja_paritals file commit 71dd12786fec6c4aba0170a3bfd9022b06f5eede Author: mbowcut2 <[email protected]> Date: Wed Jul 31 14:10:29 2024 -0600 Squashed commit of the following: commit 5e3b30515c4bc437127e7fb21f53cb0bd511c4ca Author: Trevor Martin <[email protected]> Date: Mon Jul 22 09:31:44 2024 -0600 D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> commit 09e5d9720ad21e44fc7916d71bde3fd7a9dfa7ef Author: Kendell Clement <[email protected]> Date: Thu Jul 18 14:31:54 2024 -0600 Asymmetrical cut point (#457) * add cut_point_ind to plot_alleles_heatmap for asymmetrical plotting * Cole asymmetrical cut point (#453) * Pin versions of numpy and matplotlib in CI environment (#84) (#452) * Reduce duplication and implement cut_point_ind in plot_alleles_heatmap_hist --------- Co-authored-by: Cole Lyman <[email protected]> commit 8d92972694ddff629dad844a6ad100459f69751d Author: Cole Lyman <[email protected]> Date: Thu Jul 18 14:29:40 2024 -0600 Cole/update args (#85) (#456) commit 44f692ecabf5e2eb96ee0cfd7bae62343da7810c Author: Cole Lyman <[email protected]> Date: Mon Jul 15 16:17:29 2024 -0600 Implement new pooled mixed-mode default behavior (#454) * changes for pooled mixed-mode default (#83) * changes for pooled mixed-mode default * deprecated old arg * added integration tests for mixed mode * fixed test target * updated test name * pinned numpy * Fix integration tests yml * pinning matplotlib * added print to CI tests * changed mixed mode info string * Remove pooled-mixed-mode-align-to-genome step from Github Actions * Update demultiplex_genome_wide parameter and help * Convert args.json to unix line endings * Add Pooled mixed mode demux run * Update the name of the argument in Pooled * Point integration tests back to master --------- Co-authored-by: Cole Lyman <[email protected]> * Revert change to pooled mixed mode info statement (#86) --------- Co-authored-by: mbowcut2 <[email protected]> commit 79b482b55a0e8edbc03ec22bd2714bade1e90323 Author: Cole Lyman <[email protected]> Date: Tue Jul 9 12:53:23 2024 -0600 Pin versions of numpy and matplotlib in CI environment (#84) (#452) commit 80dc1bdd72d50f989717bfc5f8156bc3495c45f4 Author: Kendell Clement <[email protected]> Date: Thu May 30 14:07:42 2024 -0600 Add padding to image commit 381755daf0939aaf2745df0a802c809633aff47d Author: Kendell Clement <[email protected]> Date: Thu May 30 13:59:57 2024 -0600 White background for schematic for dark mode commit d649db71e610bd8840fbb8d46fadb07789b67390 Author: Cole Lyman <[email protected]> Date: Fri May 24 12:45:53 2024 -0600 Fix typo and move flexiguide to debug (#77) (#438) * Change flexiguide output to debug level * Fix typo in fastp merged output file name commit 71181f50ef2b39015523b1a71d9fd1bf0dce14eb Author: Cole Lyman <[email protected]> Date: Mon May 13 13:34:00 2024 -0600 Prefix the release Docker tag with a `v` (#434) commit d2c2be18a6bb64b0e742cc24c4665980a24324bc Author: Cole Lyman <[email protected]> Date: Mon May 13 09:41:32 2024 -0600 Showing sgRNA sequences on hover in CRISPRessoPro (#432) * Passing sgRNA sequences to regular and Batch D3 plots (#73) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Update integration_tests.yml to point back at master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Push new releases to ECR (#74) * Create aws_ecr.yml (#1) * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * us-east-1 * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Fix d3 sgRNA sequences (#76) * Pass correct sgRNA_sequences to d3 plot * Pass correct sgRNA sequence to prime editor plot for d3 * Resize plotly (#75) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Pass div id for plotly * Remove debug --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit 1c504274818b6b17fb60620d48fd92cb2e50566d Author: Cole Lyman <[email protected]> Date: Thu May 9 14:16:25 2024 -0600 Fix plots and improve plot error handling (#431) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit acb2ea8e26dff4cd11f71301b344f81b1cec9040 Author: Kendell Clement <[email protected]> Date: Thu May 2 13:49:33 2024 -0600 Use recent docker image for CircleCI testing that includes updated pandas commit 38fd76dbd7ce2087468f9f454b548777de959a68 Author: Cole Lyman <[email protected]> Date: Wed May 1 16:42:28 2024 -0600 Cole/fix status file name (#69) (#430) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> commit 3ec22e5fd09e432c9997d30e5f9ee51a2cc00d7b Author: Kendell Clement <[email protected]> Date: Wed May 1 13:08:11 2024 -0600 Remove linked space in readme commit 340a4e16795a5e500411e11572ec267525985009 Author: Cole Lyman <[email protected]> Date: Wed May 1 13:07:14 2024 -0600 Fix batch mode pandas warning. (#70) (#429) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: mbowcut2 <[email protected]> commit 1bc9e906f0ded81f80761d1ec375ee50a4f882a9 Author: Cole Lyman <[email protected]> Date: Fri Apr 26 16:26:27 2024 -0600 Bump version to 2.3.1 and change default CRISPRessoPooled behavior to change in 2.3.2 (#428) commit 5638a1f6ffa973231f23422e9c757fa8cd4af7cc Author: Kendell Clement <[email protected]> Date: Wed Apr 24 18:00:43 2024 -0600 Spelling fixes commit d6011f29db16d8fc1c1e7222457b7f9a1f671de6 Author: Cole Lyman <[email protected]> Date: Wed Apr 24 09:33:53 2024 -0600 Extract `jinja_partials` and fix CRISPRessoPooled fastp errors (#425) * Updated README (#64) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Cole Lyman <[email protected]> * Extract jinja_partials (#65) * Extract jinja_partials code * Remove Plotly dependency from setup.py * Fix CRISPRessoPooled flash errors (#68) * Fix replacing flash intermediate files with fastp intermediate files This also moves where the files are added to `files_to_remove` up to near where they are created. * Update to run test branch with paired end Pooled test * Add pooled-paired-sim test to integration tests * Replace flash and trimmomatic with fastp and remove plotly from Github Actions environment * Change test branch back to master --------- Co-authored-by: Trevor Martin <[email protected]> commit f4858a30c43374f54058b3ad9c1e965e1ab7fb46 Author: Cole Lyman <[email protected]> Date: Tue Apr 23 17:00:28 2024 -0600 Updated README (#64) (#424) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Trevor Martin <[email protected]> commit c3dbff0fccd44b0b1a9c246dd2aa629ddc515787 Author: Kendell Clement <[email protected]> Date: Mon Apr 22 11:24:59 2024 -0600 Update CRISPRessoPooledCORE.py (#423) Fix bug in error reporting if duplicate names are present commit 20903c14877e5166b1b8a7b50b8fcab450ea3ca6 Author: Cole Lyman <[email protected]> Date: Thu Apr 18 16:55:39 2024 -0600 Remove extra imports from CRISPRessoCore (#67) (#422) commit 4aae57e5be475cd717792265bee36a71a99425de Author: Cole Lyman <[email protected]> Date: Thu Apr 18 10:00:19 2024 -0600 Cole/refactor jinja undefined (#66) (#421) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Refactor logging Jinja2 undefined variable warnings * Revert plot_11a update * Update intedration test branch * Update jinja to warn on undefined but not fail. Fix all undefined warnings * Fix github integration tests ref * One more undefined variable --------- Co-authored-by: Samuel Nichols <[email protected]> commit 768c3c05bf1786a2a32e135b6e145cd6503c3db1 Author: Cole Lyman <[email protected]> Date: Tue Apr 9 17:30:10 2024 -0600 Fix Jinja2 undefined variables (#63) (#417) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Revert plot_11a update * Update intedration test branch * Update branch for integration tests commit 7e18f08cc1ac5f247a0fd1bbb394ccd9b0a07c2e Author: Han Dai <[email protected]> Date: Fri Apr 5 18:36:41 2024 -0400 fix: change all U+00A0 to U+0020 (#400) commit 235dc29c0cd0fcca2e999148d4660acf00b07221 Author: Cole Lyman <[email protected]> Date: Fri Apr 5 16:36:16 2024 -0600 Fastp, args as data, guardrails, and PE fix (#415) * Change CRISPResso_status.txt format to JSON (#46) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * add json read for status file * changed Formatter to json format * fixed json access variable name: message * changed perentage_complete to numeric * changed status file to .json * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * New makefile commands * changed file to .json * changed status to json file * Make JSON human readable by adding new lines * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * point to test branch * pointed CI config to testing branch * Update integration_tests.yml point to master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Trevor/fastp integration (#50) * Update check_program to check versions and create check_fastq function * Update fastq arg, implement fastp in get_most_frequent_reads * Bump version to 2.3.0 * Deprecate Flash and Trimmomatic parameters, and update fastp params * Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap * Implement trimming of single end reads * Merge (and trim) reads in CRISPRessoCORE with fastp * Modify error handling to account for fastp errors * Replace flash and trimmomatic with fastp in Docker dependencies * Update LICENSE.txt with fastp info * Remove min and max amplicon length (no longer needed) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Implement trimming with fastp in CRISPRessoPooled * Implemend merging (and trimming) with fastp in CRISPRessoPooled * Fixed minor fastp errors * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Update where the test point to * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * initial readme modifications * Updated readme to remove deprecated commands, updated help text to reflect new version and fastp * Pointing test branch back at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> * Fix case sensitivity in Prime Editing mode (#54) * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Make all amplicons in amplicon_seq_arr uppercase This fixes https://github.com/pinellolab/CRISPResso2/issues/396 * Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_nu… Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]>

* Mckay/halt on plot fail (#103) * Mckay/c2pro reports test (#99) * Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- * Squashed commit of the following: commit 6ec98a05ee70f85b5aa0ac15ab6094b7f1f20d08 Author: mbowcut2 <[email protected]> Date: Tue Aug 13 16:44:39 2024 -0600 dict key changes commit 7cfd5acf06da4eb6f49453144ee1fed1e1488a7a Author: mbowcut2 <[email protected]> Date: Thu Aug 8 15:30:31 2024 -0600 added C2PRO install check back commit bfb0003329ea61b5c79c7e1df8d9a73ec5a508db Author: mbowcut2 <[email protected]> Date: Fri Aug 2 13:08:12 2024 -0600 fixed key error conditionals commit 84444e7480605206cb3efa4a0db675c55e717304 Author: mbowcut2 <[email protected]> Date: Fri Aug 2 09:22:44 2024 -0600 use local jinja_paritals file commit 71dd12786fec6c4aba0170a3bfd9022b06f5eede Author: mbowcut2 <[email protected]> Date: Wed Jul 31 14:10:29 2024 -0600 Squashed commit of the following: commit 5e3b30515c4bc437127e7fb21f53cb0bd511c4ca Author: Trevor Martin <[email protected]> Date: Mon Jul 22 09:31:44 2024 -0600 D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> commit 09e5d9720ad21e44fc7916d71bde3fd7a9dfa7ef Author: Kendell Clement <[email protected]> Date: Thu Jul 18 14:31:54 2024 -0600 Asymmetrical cut point (#457) * add cut_point_ind to plot_alleles_heatmap for asymmetrical plotting * Cole asymmetrical cut point (#453) * Pin versions of numpy and matplotlib in CI environment (#84) (#452) * Reduce duplication and implement cut_point_ind in plot_alleles_heatmap_hist --------- Co-authored-by: Cole Lyman <[email protected]> commit 8d92972694ddff629dad844a6ad100459f69751d Author: Cole Lyman <[email protected]> Date: Thu Jul 18 14:29:40 2024 -0600 Cole/update args (#85) (#456) commit 44f692ecabf5e2eb96ee0cfd7bae62343da7810c Author: Cole Lyman <[email protected]> Date: Mon Jul 15 16:17:29 2024 -0600 Implement new pooled mixed-mode default behavior (#454) * changes for pooled mixed-mode default (#83) * changes for pooled mixed-mode default * deprecated old arg * added integration tests for mixed mode * fixed test target * updated test name * pinned numpy * Fix integration tests yml * pinning matplotlib * added print to CI tests * changed mixed mode info string * Remove pooled-mixed-mode-align-to-genome step from Github Actions * Update demultiplex_genome_wide parameter and help * Convert args.json to unix line endings * Add Pooled mixed mode demux run * Update the name of the argument in Pooled * Point integration tests back to master --------- Co-authored-by: Cole Lyman <[email protected]> * Revert change to pooled mixed mode info statement (#86) --------- Co-authored-by: mbowcut2 <[email protected]> commit 79b482b55a0e8edbc03ec22bd2714bade1e90323 Author: Cole Lyman <[email protected]> Date: Tue Jul 9 12:53:23 2024 -0600 Pin versions of numpy and matplotlib in CI environment (#84) (#452) commit 80dc1bdd72d50f989717bfc5f8156bc3495c45f4 Author: Kendell Clement <[email protected]> Date: Thu May 30 14:07:42 2024 -0600 Add padding to image commit 381755daf0939aaf2745df0a802c809633aff47d Author: Kendell Clement <[email protected]> Date: Thu May 30 13:59:57 2024 -0600 White background for schematic for dark mode commit d649db71e610bd8840fbb8d46fadb07789b67390 Author: Cole Lyman <[email protected]> Date: Fri May 24 12:45:53 2024 -0600 Fix typo and move flexiguide to debug (#77) (#438) * Change flexiguide output to debug level * Fix typo in fastp merged output file name commit 71181f50ef2b39015523b1a71d9fd1bf0dce14eb Author: Cole Lyman <[email protected]> Date: Mon May 13 13:34:00 2024 -0600 Prefix the release Docker tag with a `v` (#434) commit d2c2be18a6bb64b0e742cc24c4665980a24324bc Author: Cole Lyman <[email protected]> Date: Mon May 13 09:41:32 2024 -0600 Showing sgRNA sequences on hover in CRISPRessoPro (#432) * Passing sgRNA sequences to regular and Batch D3 plots (#73) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Update integration_tests.yml to point back at master --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Push new releases to ECR (#74) * Create aws_ecr.yml (#1) * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * us-east-1 * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Fix d3 sgRNA sequences (#76) * Pass correct sgRNA_sequences to d3 plot * Pass correct sgRNA sequence to prime editor plot for d3 * Resize plotly (#75) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file * Pass div id for plotly * Remove debug --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit 1c504274818b6b17fb60620d48fd92cb2e50566d Author: Cole Lyman <[email protected]> Date: Thu May 9 14:16:25 2024 -0600 Fix plots and improve plot error handling (#431) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman <[email protected]> * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Remove token from integration tests file --------- Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: mbowcut2 <[email protected]> commit acb2ea8e26dff4cd11f71301b344f81b1cec9040 Author: Kendell Clement <[email protected]> Date: Thu May 2 13:49:33 2024 -0600 Use recent docker image for CircleCI testing that includes updated pandas commit 38fd76dbd7ce2087468f9f454b548777de959a68 Author: Cole Lyman <[email protected]> Date: Wed May 1 16:42:28 2024 -0600 Cole/fix status file name (#69) (#430) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam <[email protected]> commit 3ec22e5fd09e432c9997d30e5f9ee51a2cc00d7b Author: Kendell Clement <[email protected]> Date: Wed May 1 13:08:11 2024 -0600 Remove linked space in readme commit 340a4e16795a5e500411e11572ec267525985009 Author: Cole Lyman <[email protected]> Date: Wed May 1 13:07:14 2024 -0600 Fix batch mode pandas warning. (#70) (#429) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: mbowcut2 <[email protected]> commit 1bc9e906f0ded81f80761d1ec375ee50a4f882a9 Author: Cole Lyman <[email protected]> Date: Fri Apr 26 16:26:27 2024 -0600 Bump version to 2.3.1 and change default CRISPRessoPooled behavior to change in 2.3.2 (#428) commit 5638a1f6ffa973231f23422e9c757fa8cd4af7cc Author: Kendell Clement <[email protected]> Date: Wed Apr 24 18:00:43 2024 -0600 Spelling fixes commit d6011f29db16d8fc1c1e7222457b7f9a1f671de6 Author: Cole Lyman <[email protected]> Date: Wed Apr 24 09:33:53 2024 -0600 Extract `jinja_partials` and fix CRISPRessoPooled fastp errors (#425) * Updated README (#64) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Cole Lyman <[email protected]> * Extract jinja_partials (#65) * Extract jinja_partials code * Remove Plotly dependency from setup.py * Fix CRISPRessoPooled flash errors (#68) * Fix replacing flash intermediate files with fastp intermediate files This also moves where the files are added to `files_to_remove` up to near where they are created. * Update to run test branch with paired end Pooled test * Add pooled-paired-sim test to integration tests * Replace flash and trimmomatic with fastp and remove plotly from Github Actions environment * Change test branch back to master --------- Co-authored-by: Trevor Martin <[email protected]> commit f4858a30c43374f54058b3ad9c1e965e1ab7fb46 Author: Cole Lyman <[email protected]> Date: Tue Apr 23 17:00:28 2024 -0600 Updated README (#64) (#424) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Trevor Martin <[email protected]> commit c3dbff0fccd44b0b1a9c246dd2aa629ddc515787 Author: Kendell Clement <[email protected]> Date: Mon Apr 22 11:24:59 2024 -0600 Update CRISPRessoPooledCORE.py (#423) Fix bug in error reporting if duplicate names are present commit 20903c14877e5166b1b8a7b50b8fcab450ea3ca6 Author: Cole Lyman <[email protected]> Date: Thu Apr 18 16:55:39 2024 -0600 Remove extra imports from CRISPRessoCore (#67) (#422) commit 4aae57e5be475cd717792265bee36a71a99425de Author: Cole Lyman <[email protected]> Date: Thu Apr 18 10:00:19 2024 -0600 Cole/refactor jinja undefined (#66) (#421) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Refactor logging Jinja2 undefined variable warnings * Revert plot_11a update * Update intedration test branch * Update jinja to warn on undefined but not fail. Fix all undefined warnings * Fix github integration tests ref * One more undefined variable --------- Co-authored-by: Samuel Nichols <[email protected]> commit 768c3c05bf1786a2a32e135b6e145cd6503c3db1 Author: Cole Lyman <[email protected]> Date: Tue Apr 9 17:30:10 2024 -0600 Fix Jinja2 undefined variables (#63) (#417) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Revert plot_11a update * Update intedration test branch * Update branch for integration tests commit 7e18f08cc1ac5f247a0fd1bbb394ccd9b0a07c2e Author: Han Dai <[email protected]> Date: Fri Apr 5 18:36:41 2024 -0400 fix: change all U+00A0 to U+0020 (#400) commit 235dc29c0cd0fcca2e999148d4660acf00b07221 Author: Cole Lyman <[email protected]> Date: Fri Apr 5 16:36:16 2024 -0600 Fastp, args as data, guardrails, and PE fix (#415) * Change CRISPResso_status.txt format to JSON (#46) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * add json read for status file * changed Formatter to json format * fixed json access variable name: message * changed perentage_complete to numeric * changed status file to .json * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * New makefile commands * changed file to .json * changed status to json file * Make JSON human readable by adding new lines * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * point to test branch * pointed CI config to testing branch * Update integration_tests.yml point to master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Trevor/fastp integration (#50) * Update check_program to check versions and create check_fastq function * Update fastq arg, implement fastp in get_most_frequent_reads * Bump version to 2.3.0 * Deprecate Flash and Trimmomatic parameters, and update fastp params * Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap * Implement trimming of single end reads * Merge (and trim) reads in CRISPRessoCORE with fastp * Modify error handling to account for fastp errors * Replace flash and trimmomatic with fastp in Docker dependencies * Update LICENSE.txt with fastp info * Remove min and max amplicon length (no longer needed) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Implement trimming with fastp in CRISPRessoPooled * Implemend merging (and trimming) with fastp in CRISPRessoPooled * Fixed minor fastp errors * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Update where the test point to * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * initial readme modifications * Updated readme to remove deprecated commands, updated help text to reflect new version and fastp * Pointing test branch back at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Cole Lyman <[email protected]> * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman <[email protected]> Co-authored-by: mbowcut2 <[email protected]> * Fix case sensitivity in Prime Editing mode (#54) * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Make all amplicons in amplicon_seq_arr uppercase This fixes https://github.com/pinellolab/CRISPResso2/issues/396 * Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman <[email protected]> --------- Co-authored-by: Cole Lyman <[email protected]> * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. … Co-authored-by: mbowcut2 <[email protected]> Co-authored-by: Trevor Martin <[email protected]> Co-authored-by: Samuel Nichols <[email protected]> Co-authored-by: Trevor Martin <[email protected]>

ZipFile: set allowZip64=True to write larger allele frequency tables

7624815

Addresses terminating ERROR: Filesize would require ZIP64 extensions when trying to write compressed allele frequency tables > 2 GB

kclem merged commit 8c58471 into pinellolab:master May 10, 2020

Colelyman mentioned this pull request Mar 28, 2024

Fix the assignment of multiple quantification window coordinates #403

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZipFile: set allowZip64=True to write larger allele frequency tables #42

ZipFile: set allowZip64=True to write larger allele frequency tables #42

ronaldhause commented May 4, 2020

ZipFile: set allowZip64=True to write larger allele frequency tables #42

ZipFile: set allowZip64=True to write larger allele frequency tables #42

Conversation

ronaldhause commented May 4, 2020