From 14139c380c2881a8b6fa781e5912f64ab35463d7 Mon Sep 17 00:00:00 2001 From: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Date: Wed, 2 Oct 2024 10:52:09 -0600 Subject: [PATCH 1/6] Mckay/halt on plot fail (#103) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Mckay/c2pro reports test (#99) * Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Squashed commit of the following: commit 6ec98a05ee70f85b5aa0ac15ab6094b7f1f20d08 Author: mbowcut2 Date: Tue Aug 13 16:44:39 2024 -0600 dict key changes commit 7cfd5acf06da4eb6f49453144ee1fed1e1488a7a Author: mbowcut2 Date: Thu Aug 8 15:30:31 2024 -0600 added C2PRO install check back commit bfb0003329ea61b5c79c7e1df8d9a73ec5a508db Author: mbowcut2 Date: Fri Aug 2 13:08:12 2024 -0600 fixed key error conditionals commit 84444e7480605206cb3efa4a0db675c55e717304 Author: mbowcut2 Date: Fri Aug 2 09:22:44 2024 -0600 use local jinja_paritals file commit 71dd12786fec6c4aba0170a3bfd9022b06f5eede Author: mbowcut2 Date: Wed Jul 31 14:10:29 2024 -0600 Squashed commit of the following: commit 5e3b30515c4bc437127e7fb21f53cb0bd511c4ca Author: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Date: Mon Jul 22 09:31:44 2024 -0600 D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman commit 09e5d9720ad21e44fc7916d71bde3fd7a9dfa7ef Author: Kendell Clement Date: Thu Jul 18 14:31:54 2024 -0600 Asymmetrical cut point (#457) * add cut_point_ind to plot_alleles_heatmap for asymmetrical plotting * Cole asymmetrical cut point (#453) * Pin versions of numpy and matplotlib in CI environment (#84) (#452) * Reduce duplication and implement cut_point_ind in plot_alleles_heatmap_hist --------- Co-authored-by: Cole Lyman commit 8d92972694ddff629dad844a6ad100459f69751d Author: Cole Lyman Date: Thu Jul 18 14:29:40 2024 -0600 Cole/update args (#85) (#456) commit 44f692ecabf5e2eb96ee0cfd7bae62343da7810c Author: Cole Lyman Date: Mon Jul 15 16:17:29 2024 -0600 Implement new pooled mixed-mode default behavior (#454) * changes for pooled mixed-mode default (#83) * changes for pooled mixed-mode default * deprecated old arg * added integration tests for mixed mode * fixed test target * updated test name * pinned numpy * Fix integration tests yml * pinning matplotlib * added print to CI tests * changed mixed mode info string * Remove pooled-mixed-mode-align-to-genome step from Github Actions * Update demultiplex_genome_wide parameter and help * Convert args.json to unix line endings * Add Pooled mixed mode demux run * Update the name of the argument in Pooled * Point integration tests back to master --------- Co-authored-by: Cole Lyman * Revert change to pooled mixed mode info statement (#86) --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> commit 79b482b55a0e8edbc03ec22bd2714bade1e90323 Author: Cole Lyman Date: Tue Jul 9 12:53:23 2024 -0600 Pin versions of numpy and matplotlib in CI environment (#84) (#452) commit 80dc1bdd72d50f989717bfc5f8156bc3495c45f4 Author: Kendell Clement Date: Thu May 30 14:07:42 2024 -0600 Add padding to image commit 381755daf0939aaf2745df0a802c809633aff47d Author: Kendell Clement Date: Thu May 30 13:59:57 2024 -0600 White background for schematic for dark mode commit d649db71e610bd8840fbb8d46fadb07789b67390 Author: Cole Lyman Date: Fri May 24 12:45:53 2024 -0600 Fix typo and move flexiguide to debug (#77) (#438) * Change flexiguide output to debug level * Fix typo in fastp merged output file name commit 71181f50ef2b39015523b1a71d9fd1bf0dce14eb Author: Cole Lyman Date: Mon May 13 13:34:00 2024 -0600 Prefix the release Docker tag with a `v` (#434) commit d2c2be18a6bb64b0e742cc24c4665980a24324bc Author: Cole Lyman Date: Mon May 13 09:41:32 2024 -0600 Showing sgRNA sequences on hover in CRISPRessoPro (#432) * Passing sgRNA sequences to regular and Batch D3 plots (#73) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Update integration_tests.yml to point back at master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Push new releases to ECR (#74) * Create aws_ecr.yml (#1) * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * us-east-1 * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Fix d3 sgRNA sequences (#76) * Pass correct sgRNA_sequences to d3 plot * Pass correct sgRNA sequence to prime editor plot for d3 * Resize plotly (#75) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Pass div id for plotly * Remove debug --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> commit 1c504274818b6b17fb60620d48fd92cb2e50566d Author: Cole Lyman Date: Thu May 9 14:16:25 2024 -0600 Fix plots and improve plot error handling (#431) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> commit acb2ea8e26dff4cd11f71301b344f81b1cec9040 Author: Kendell Clement Date: Thu May 2 13:49:33 2024 -0600 Use recent docker image for CircleCI testing that includes updated pandas commit 38fd76dbd7ce2087468f9f454b548777de959a68 Author: Cole Lyman Date: Wed May 1 16:42:28 2024 -0600 Cole/fix status file name (#69) (#430) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam commit 3ec22e5fd09e432c9997d30e5f9ee51a2cc00d7b Author: Kendell Clement Date: Wed May 1 13:08:11 2024 -0600 Remove linked space in readme commit 340a4e16795a5e500411e11572ec267525985009 Author: Cole Lyman Date: Wed May 1 13:07:14 2024 -0600 Fix batch mode pandas warning. (#70) (#429) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> commit 1bc9e906f0ded81f80761d1ec375ee50a4f882a9 Author: Cole Lyman Date: Fri Apr 26 16:26:27 2024 -0600 Bump version to 2.3.1 and change default CRISPRessoPooled behavior to change in 2.3.2 (#428) commit 5638a1f6ffa973231f23422e9c757fa8cd4af7cc Author: Kendell Clement Date: Wed Apr 24 18:00:43 2024 -0600 Spelling fixes commit d6011f29db16d8fc1c1e7222457b7f9a1f671de6 Author: Cole Lyman Date: Wed Apr 24 09:33:53 2024 -0600 Extract `jinja_partials` and fix CRISPRessoPooled fastp errors (#425) * Updated README (#64) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Cole Lyman * Extract jinja_partials (#65) * Extract jinja_partials code * Remove Plotly dependency from setup.py * Fix CRISPRessoPooled flash errors (#68) * Fix replacing flash intermediate files with fastp intermediate files This also moves where the files are added to `files_to_remove` up to near where they are created. * Update to run test branch with paired end Pooled test * Add pooled-paired-sim test to integration tests * Replace flash and trimmomatic with fastp and remove plotly from Github Actions environment * Change test branch back to master --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> commit f4858a30c43374f54058b3ad9c1e965e1ab7fb46 Author: Cole Lyman Date: Tue Apr 23 17:00:28 2024 -0600 Updated README (#64) (#424) * Updating README to fix argument, email, and formatting * removing superfluous files * Add link to CRISPRessoPro, move CRISPRessoPro section to end, and fix JSON formatting * Remove link to CRISPRessoPro * Replace Docker badge with link to tags * Add bullet points to Guardrails section and improve formatting * Fix typo and removed colons from guardrails --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> commit c3dbff0fccd44b0b1a9c246dd2aa629ddc515787 Author: Kendell Clement Date: Mon Apr 22 11:24:59 2024 -0600 Update CRISPRessoPooledCORE.py (#423) Fix bug in error reporting if duplicate names are present commit 20903c14877e5166b1b8a7b50b8fcab450ea3ca6 Author: Cole Lyman Date: Thu Apr 18 16:55:39 2024 -0600 Remove extra imports from CRISPRessoCore (#67) (#422) commit 4aae57e5be475cd717792265bee36a71a99425de Author: Cole Lyman Date: Thu Apr 18 10:00:19 2024 -0600 Cole/refactor jinja undefined (#66) (#421) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Refactor logging Jinja2 undefined variable warnings * Revert plot_11a update * Update intedration test branch * Update jinja to warn on undefined but not fail. Fix all undefined warnings * Fix github integration tests ref * One more undefined variable --------- Co-authored-by: Samuel Nichols commit 768c3c05bf1786a2a32e135b6e145cd6503c3db1 Author: Cole Lyman Date: Tue Apr 9 17:30:10 2024 -0600 Fix Jinja2 undefined variables (#63) (#417) * Replace Jinja2 PackageLoader with FileSystemLoader The PackageLoader doesn't work with a fairly recent version of Jinja2 (3.0.1) and Python 3.9. Replacing with FileSystemLoader work with the older version and the latest version. * Fix undefined variable `amplicon_name` in report template * Revert plot_11a update * Update intedration test branch * Update branch for integration tests commit 7e18f08cc1ac5f247a0fd1bbb394ccd9b0a07c2e Author: Han Dai Date: Fri Apr 5 18:36:41 2024 -0400 fix: change all U+00A0 to U+0020 (#400) commit 235dc29c0cd0fcca2e999148d4660acf00b07221 Author: Cole Lyman Date: Fri Apr 5 16:36:16 2024 -0600 Fastp, args as data, guardrails, and PE fix (#415) * Change CRISPResso_status.txt format to JSON (#46) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * add json read for status file * changed Formatter to json format * fixed json access variable name: message * changed perentage_complete to numeric * changed status file to .json * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * New makefile commands * changed file to .json * changed status to json file * Make JSON human readable by adding new lines * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * point to test branch * pointed CI config to testing branch * Update integration_tests.yml point to master --------- Co-authored-by: Cole Lyman Co-authored-by: Samuel Nichols * Trevor/fastp integration (#50) * Update check_program to check versions and create check_fastq function * Update fastq arg, implement fastp in get_most_frequent_reads * Bump version to 2.3.0 * Deprecate Flash and Trimmomatic parameters, and update fastp params * Update guess_amplicons and guess_guides to remove max_paired_end_reads_overlap * Implement trimming of single end reads * Merge (and trim) reads in CRISPRessoCORE with fastp * Modify error handling to account for fastp errors * Replace flash and trimmomatic with fastp in Docker dependencies * Update LICENSE.txt with fastp info * Remove min and max amplicon length (no longer needed) * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman * Implement trimming with fastp in CRISPRessoPooled * Implemend merging (and trimming) with fastp in CRISPRessoPooled * Fixed minor fastp errors * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Update where the test point to * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * 3.4->2.08 * Put ttf-mscorefonts-installer back above apt-get clean * restore slash, replace fastp with trimmomatic and flash, add autoremove step --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * initial readme modifications * Updated readme to remove deprecated commands, updated help text to reflect new version and fastp * Pointing test branch back at master --------- Co-authored-by: Cole Lyman Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Samuel Nichols * Guardrails clean history (#34) * Include guardrail functions * Add CRISPRessoReports subtree * Refactor to use CRISPRessoReports module * Include guardrail functions * Functional guardrails, needs reports update * Add guardrail partial * fix guardrials partial * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Update C cythonized files * Add exact numbers to guardrails printouts * Remove extraneous whitespace from CRISPRessoCOREResources.pyx * Fix calculation of `total_mods` from being negative The issue was that `all_deletion_coordinates` just tells you how many deletions were present, but not how long the deletion is. * Changes to message * Remove old tag * Point tests at guardrails * Restore C2 pro check * Save message with guardrail name * Point tests repo at master --------- Co-authored-by: Cole Lyman Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Fix case sensitivity in Prime Editing mode (#54) * Move read filtering to after merging in CRISPResso (#39) * Move read filtering to after merging This is in an effort to be consistent with the behavior and results of CRISPRessoPooled. * Properly assign the correct file names for read filtering * Add space around operators * GitHub actions on pr (#51) * Run integration tests on pull_request * Run pytest on pull_request * Run pylint on pull_request * Run tests on PR only when opening PR (#53) * Update reports (#52) * Update report changes * Switch branch of integration test repo * Remove extraneous `crispresso_data_path` * Point integration tests back to master * Make all amplicons in amplicon_seq_arr uppercase This fixes https://github.com/pinellolab/CRISPResso2/issues/396 * Allow RNA values to be provided for prime_editing_pegRNA_scaffold_seq * Fix 'Prime-edited' key not found (#32) * Move 'Prime-edited' amplicon name check By moving this, it will check if there is an amplicon named 'Prime-edited' (which is a reserved name) even if the `prime_editing_pegRNA_extension_seq` parameter is empty. * Only search for scaffold integration when pegRNA extension seq is provided * Remove spaces at the end of lines * Docker size (#49) * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- Co-authored-by: Cole Lyman --------- Co-authored-by: Cole Lyman * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns ----… * added argument to args.json * - add arg to plot partial() - if arg is True, then raise error * add to aggregate * point to test branch * change log message * Read Alignment Parallelization (#98) * Initial parallization work * Initial parallization work * Fleshed out process function, added tracking for manager dictionary * lots of debugging of the process function * parallelization achieved * Improved boundary function * Removing prints * removing old code * Failed cache generator function * Created single thread seq_cache generator * initial functioning parallelization * adding data to variantCache and N_ constants * more edits * replacing old aln stats * Fixed return values * changing output file * fixed boundary error * Adding more tracking of timings * optimized stat tracking considerably * removing imports and unneccesary checks * more logging * Unbalancing the processes to allow for better locking interactions * Adding amplicon name to plotting * fixing stat tracking * removed old code * pin numpy * Pin versions of numpy and matplotlib in CI environment (#84) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * old changes * removed some superfluous code * removed some superfluous code * reverting to old flow if one process * Enriched aln_stats with aligned read_length * fixed declaration bug * Fixing stats tracking for irregular reads * fixed stat tracking by multiplying variant count * adding some function explanations, cleaning up code, optimizing lock updating * Cleaned up code, potentially breaks tests * Revert "Cleaned up code, potentially breaks tests" This reverts commit e7cfe3d239f6fdfedafa26913363ddcc98c81721. * Revert "adding some function explanations, cleaning up code, optimizing lock updating" This reverts commit 96deee25b1af7e29c40ac360295c30509ede83ae. * replaced test branch, reverted to working version, added .upate() inside lock * checking old cython branch * reverting breaking changes * pointing back at my test branch, unreverting non breaking changes * timing log change * pin numpy * Pin versions of numpy and matplotlib in CI environment (#84) * pointing to correct test branch * removed debug statement * memory tracking * Replaced seq_cache with variantCache for improved memory * removing print statements * commenting out memory tracking * Removing one extra loop to improve processing time * optimizing info, removing old functions and print statements * removing empty string handling * removed empty string, reworded comments * removing uneccesary files * Literally minding my p's and q's * Reset .c files * Editing function comments, renaming managerCache, removing debug print statement * Initial attempt at creating a temp_variant file for memory optimization * update * reading and writing to tsv to save on memory * Merged in memory optimization branch * Updating pytests for equal boundary changes * Refactored write out fastq, improved tests * Replace zcat (#94) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Replace zcat with gunzip -c in `get_most_frequent_reads` --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Limiting plotting processes * Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- Co-authored-by: Cole Lyman * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- Co-authored-by: Sam * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Add BAM test cases to Github Actions integration tests * Add "Run" to step names * Point to trevor/fastq-testing test branch * Add in Aggregate test case to Github Actions * parallelizing process_single_fastq_write_bam_out function * wip parallelization for process_bam function * Fixing write out bug * Functional bam input parallelized function * cleaned up code, removed old versions of functions * trying again * Remove duplicate c2_tests * Remove commented out copy c2_tests * Updating file access to close on error, removing unnecesary bam appendings * Add `percent_complete`to info statements (#102) * Fix CRISPRessoAggregate bug and other improvements (#95) (#470) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Display percentages in the CLI output (#88) (#473) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Display percentages in the CLI output --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * No pool (#79) (#474) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Passing sgRNA sequences to regular and Batch D3 plots (#73) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Update integration_tests.yml to point back at master --------- * Push new releases to ECR (#74) * Create aws_ecr.yml (#1) * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * us-east-1 * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Update aws_ecr.yml * Pass div id for plotly * Remove debug * Don't use thread pool with 1 process * Fix logger issue * Catchup * Remove extra print statements * Restrict generation of multiprocessing pool to when n_processes > 1 * Switch test branch to version bump * Fix variable name error * Change test branch back to master * Fix CRISPRessoAggregate bug and other improvements (#95) * D3-Enhancements (#78) * Sam/try plots (#71) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Point test to try-plots * Fix d3 not showing and plotly mixing with matplotlib * Use logger for warnings and debug statements * Point tests back at master --------- * Sam/fix plots (#72) * Fix batch mode pandas warning. (#70) * refactor to call method on DataFrame, rather than Series. Removes warning. * Fix pandas future warning in CRISPRessoWGS --------- * Functional * Cole/fix status file name (#69) * Update config file logging messages This removes printing the exception (which is essentially a duplicate), and adds a condition if no config file was provided. Also changes `json` to `config` so that it is more clear. * Fix divide by zero when no amplicons are present in Batch mode * Don't append file_prefix to status file name * Place status files in output directories * Update tests branch for file_prefix addition * Load D3 and plotly figures with pro with multiple amplicons * Update batch * Fix bug in CRISPRessoCompare with pointing to report datas with file_prefix Before this fix, when using a file_prefix the second run that was compared would not be displayed as a data in the first figure of the report. * Import CRISPRessoPro instead of importing the version When installed via conda, the version is not available * Remove `get_amplicon_output` unused function from CRISPRessoCompare Also remove unused argparse import * Implement `get_matching_allele_files` in CRISPRessoCompare and accompanying unit tests * Allow for matching of multiple guides in the same amplicon * Fix pandas FutureWarning * Change test branch back to master --------- * Try catch all futures * Fix test fail plots * Fix d3 not showing and plotly mixing with matplotlib --------- * Remove token from integration tests file * Provide sgRNA_sequences to plot_nucleotide_quilt plots * Passing sgRNA_sequences to plot * Refactor check for determining when to use CRISPREssoPro or matplotlib for Batch plots * Add max-height to Batch report samples * Change testing branch * Fix wrong check for large Batch plots * Fix typo and move flexiguide to debug (#77) * Change flexiguide output to debug level * Fix typo in fastp merged output file name * Adding id tags for d3 script enhancements * pointing to test branch * Add amplicon_name parameter to allele heatmap and line plots * Add function to extract quantification window regions from include_idxs * Scale the quantification window according to the coordinates of the sgRNA plot * added c2pro check, added space in args.json * Correct the quantification window indexes for multiple guides * Fix name of nucleotide conversion plot when guides are not the same * Fix jinja variables that aren't found * Fix multiple guide errors where the wrong sgRNA sequence was associated in d3 plot * Remove unneeded variable and extra whitespace * Switch test branch to master --------- * Add amplicon_name to plot functions * Add sgRNA sequences to nucleotide quilt parameters in Aggregate * Add custom_colors to Aggregate plot functions * Update Aggregate and make_aggregate_report to have logger and tool * Write command_used to Aggregate .json info file * Point to new test branch and add Aggregate run * Make the order of Aggregate runs explicit * Sort all instances of crispresso2_folder_info in Aggregate * Sort df_summary_quantification df in Aggregate * Try sorting with a list of single column * Update to correct test branch * Move to master test branch --------- --------- Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> * Add percent_complete to subprocess alignment * Remove extraneous spaces * Added more percent_complete statements to info blocks --------- Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> * Updated functions with newly merged progress percent from Cole * Minor code cleanup, removal of comments and print statements etc * Pointing test branch back at CRISPResso2_tests master branch * Added checking for failed processes causing not all reads to be counted --------- Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Cole Lyman Co-authored-by: Samuel Nichols * Implement a custom PlotException when a plot fails * Add flexiguide alignment parameters (#107) * Add flexiguid gap open and gap extend customized arguments * Implement new flexiguide gap extend and gap open parameters * Point to new test branch for integration tests * Bump version of setup-miniconda for pytest * Have flexiguide default penalties match the Needleman Wunsch defaults * Point test branch back to master * Point to test branch without whitespace * Point integration tests back to master --------- Co-authored-by: Cole Lyman Co-authored-by: Trevor Martin <60452953+trevormartinj7@users.noreply.github.com> Co-authored-by: Samuel Nichols --- CRISPResso2/CRISPRessoAggregateCORE.py | 2 ++ CRISPResso2/CRISPRessoBatchCORE.py | 1 + CRISPResso2/CRISPRessoCORE.py | 5 +++++ CRISPResso2/CRISPRessoMultiProcessing.py | 10 +++++++++- CRISPResso2/CRISPRessoPlot.py | 1 - CRISPResso2/CRISPRessoShared.py | 6 ++++++ CRISPResso2/args.json | 6 ++++++ 7 files changed, 29 insertions(+), 2 deletions(-) diff --git a/CRISPResso2/CRISPRessoAggregateCORE.py b/CRISPResso2/CRISPRessoAggregateCORE.py index d319445d..2cd6e890 100644 --- a/CRISPResso2/CRISPRessoAggregateCORE.py +++ b/CRISPResso2/CRISPRessoAggregateCORE.py @@ -71,6 +71,7 @@ def main(): parser.add_argument('--debug', help='Show debug messages', action='store_true') parser.add_argument('-v', '--verbosity', type=int, help='Verbosity level of output to the console (1-4), 4 is the most verbose', default=3) + parser.add_argument('--halt_on_plot_fail', action="store_true", help="Halt execution if a plot fails to generate") # CRISPRessoPro params parser.add_argument('--use_matplotlib', action='store_true', @@ -131,6 +132,7 @@ def main(): num_processes=n_processes, process_pool=process_pool, process_futures=process_futures, + halt_on_plot_fail=args.halt_on_plot_fail, ) #glob returns paths including the original prefix diff --git a/CRISPResso2/CRISPRessoBatchCORE.py b/CRISPResso2/CRISPRessoBatchCORE.py index 871453c4..32fe005f 100644 --- a/CRISPResso2/CRISPRessoBatchCORE.py +++ b/CRISPResso2/CRISPRessoBatchCORE.py @@ -400,6 +400,7 @@ def main(): num_processes=n_processes_for_batch, process_futures=process_futures, process_pool=process_pool, + halt_on_plot_fail=args.halt_on_plot_fail, ) window_nuc_pct_quilt_plot_names = [] diff --git a/CRISPResso2/CRISPRessoCORE.py b/CRISPResso2/CRISPRessoCORE.py index 53a64e58..529bf3bf 100644 --- a/CRISPResso2/CRISPRessoCORE.py +++ b/CRISPResso2/CRISPRessoCORE.py @@ -3746,6 +3746,7 @@ def count_alternate_alleles(sub_base_vectors, ref_name, ref_sequence, ref_total_ num_processes=n_processes, process_pool=process_pool, process_futures=process_futures, + halt_on_plot_fail=args.halt_on_plot_fail, ) ############################################################################################################################################### ### FIGURE 1: Alignment @@ -5174,6 +5175,10 @@ def get_scaffold_len(row, scaffold_start_loc, scaffold_seq): print_stacktrace_if_debug() error('Filtering error, please check your input.\n\nERROR: %s' % e) sys.exit(13) + except CRISPRessoShared.PlotException as e: + print_stacktrace_if_debug() + error(e) + sys.exit(14) except Exception as e: print_stacktrace_if_debug() error('Unexpected error, please check your input.\n\nERROR: %s' % e) diff --git a/CRISPResso2/CRISPRessoMultiProcessing.py b/CRISPResso2/CRISPRessoMultiProcessing.py index dba4c1ad..e0693b9f 100644 --- a/CRISPResso2/CRISPRessoMultiProcessing.py +++ b/CRISPResso2/CRISPRessoMultiProcessing.py @@ -16,6 +16,9 @@ import pandas as pd import traceback +from CRISPResso2.CRISPRessoShared import PlotException + + def get_max_processes(): return mp.cpu_count() @@ -284,7 +287,7 @@ def run_parallel_commands(commands_arr, n_processes=1, descriptor='CRISPResso2', pool.join() -def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool): +def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool, halt_on_plot_fail): """Run a plot in parallel if num_processes > 1, otherwise in serial. Parameters @@ -299,6 +302,8 @@ def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool) The list of futures that submitting the parallel job will return. process_pool: ProcessPoolExecutor or ThreadPoolExecutor The pool to submit the job to. + halt_on_plot_fail: bool + If True, an exception will be raised if the plot fails Returns ------- @@ -311,5 +316,8 @@ def run_plot(plot_func, plot_args, num_processes, process_futures, process_pool) else: plot_func(**plot_args) except Exception as e: + if halt_on_plot_fail: + logger.critical(f"Plot error, halting execution \n") + raise PlotException(f'There was an error generating plot {plot_func.__name__}.') logger.warn(f"Plot error {e}, skipping plot \n") logger.debug(traceback.format_exc()) diff --git a/CRISPResso2/CRISPRessoPlot.py b/CRISPResso2/CRISPRessoPlot.py index 81f69d91..a4d16495 100644 --- a/CRISPResso2/CRISPRessoPlot.py +++ b/CRISPResso2/CRISPRessoPlot.py @@ -2668,7 +2668,6 @@ def prep_alleles_table(df_alleles, reference_seq, MAX_N_ROWS, MIN_FREQUENCY): """ dna_to_numbers={'-':0,'A':1,'T':2,'C':3,'G':4,'N':5} seq_to_numbers= lambda seq: [dna_to_numbers[x] for x in seq] - X=[] annot=[] y_labels=[] diff --git a/CRISPResso2/CRISPRessoShared.py b/CRISPResso2/CRISPRessoShared.py index 78d2e052..15d1b33f 100644 --- a/CRISPResso2/CRISPRessoShared.py +++ b/CRISPResso2/CRISPRessoShared.py @@ -80,9 +80,15 @@ class OutputFolderIncompleteException(Exception): class InstallationException(Exception): pass + class InputFileFormatException(Exception): pass + +class PlotException(Exception): + pass + + ######################################### class StatusFormatter(logging.Formatter): diff --git a/CRISPResso2/args.json b/CRISPResso2/args.json index 02a6d787..4f28b1b6 100644 --- a/CRISPResso2/args.json +++ b/CRISPResso2/args.json @@ -860,6 +860,12 @@ "help": "Use matplotlib for plotting instead of plotly/d3 when CRISPRessoPro is installed", "action": "store_true", "tools": ["Core", "Batch", "Pooled", "WGS", "Compare"] + }, + "halt_on_plot_fail": { + "keys": ["--halt_on_plot_fail"], + "help": "Halt execution if a plot fails to generate", + "action": "store_true", + "tools": ["Core", "Batch", "Pooled", "WGS", "Compare"] } }, "Sections": { From ac7acf90e641736fcd62e4e54020d63f678c5bb3 Mon Sep 17 00:00:00 2001 From: Trevor Martin Date: Fri, 11 Oct 2024 14:17:43 -0600 Subject: [PATCH 2/6] Fixing bug that miscounts number of reads if extra newline characters are added or omitted from end of file --- .github/workflows/integration_tests.yml | 1 + CRISPResso2/CRISPRessoCORE.py | 8 +- CRISPResso2/CRISPRessoPooledCORE.py | 15 +-- CRISPResso2/CRISPRessoShared.py | 9 ++ tests/unit_tests/test_CRISPRessoShared.py | 118 ++++++++++++++++++++++ 5 files changed, 135 insertions(+), 16 deletions(-) diff --git a/.github/workflows/integration_tests.yml b/.github/workflows/integration_tests.yml index 64f8f516..e58a16ce 100644 --- a/.github/workflows/integration_tests.yml +++ b/.github/workflows/integration_tests.yml @@ -63,6 +63,7 @@ jobs: uses: actions/checkout@v3 with: repository: edilytics/CRISPResso2_tests + ref: "trevor/get_n_reads_fix" # ref: '' # update to specific branch - name: Run Basic diff --git a/CRISPResso2/CRISPRessoCORE.py b/CRISPResso2/CRISPRessoCORE.py index 529bf3bf..32e62731 100644 --- a/CRISPResso2/CRISPRessoCORE.py +++ b/CRISPResso2/CRISPRessoCORE.py @@ -138,10 +138,6 @@ def get_avg_read_length_fastq(fastq_filename): p = sb.Popen(cmd, shell=True, stdout=sb.PIPE) return int(p.communicate()[0].strip()) -def get_n_reads_fastq(fastq_filename): - p = sb.Popen(('z' if fastq_filename.endswith('.gz') else '' ) +"cat < \"%s\" | wc -l" % fastq_filename, shell=True, stdout=sb.PIPE) - return int(float(p.communicate()[0])/4.0) - def get_n_reads_bam(bam_filename,bam_chr_loc=""): cmd = "samtools view -c " + bam_filename + " " + bam_chr_loc p = sb.Popen(cmd, shell=True, stdout=sb.PIPE) @@ -2458,7 +2454,7 @@ def get_prime_editing_guides(this_amp_seq, this_amp_name, ref0_seq, prime_edited N_READS_INPUT = 0 if args.fastq_r1: - N_READS_INPUT = get_n_reads_fastq(args.fastq_r1) + N_READS_INPUT = CRISPRessoShared.get_n_reads_fastq(args.fastq_r1) elif args.bam_input: N_READS_INPUT = get_n_reads_bam(args.bam_input, args.bam_chr_loc) @@ -2620,7 +2616,7 @@ def get_prime_editing_guides(this_amp_seq, this_amp_name, ref0_seq, prime_edited if args.bam_input: N_READS_AFTER_PREPROCESSING = N_READS_INPUT else: - N_READS_AFTER_PREPROCESSING=get_n_reads_fastq(processed_output_filename) + N_READS_AFTER_PREPROCESSING=CRISPRessoShared.get_n_reads_fastq(processed_output_filename) if N_READS_AFTER_PREPROCESSING == 0: raise CRISPRessoShared.NoReadsAfterQualityFilteringException('No reads in input or no reads survived the average or single bp quality filtering.') diff --git a/CRISPResso2/CRISPRessoPooledCORE.py b/CRISPResso2/CRISPRessoPooledCORE.py index 7a5211e2..f328e405 100644 --- a/CRISPResso2/CRISPRessoPooledCORE.py +++ b/CRISPResso2/CRISPRessoPooledCORE.py @@ -145,11 +145,6 @@ def get_read_length_from_cigar(cigar_string): result += int(length) return result -def get_n_reads_fastq(fastq_filename): - p = sb.Popen(('z' if fastq_filename.endswith('.gz') else '' ) +"cat < %s | wc -l" % fastq_filename, shell=True, stdout=sb.PIPE) - n_reads = int(float(p.communicate()[0])/4.0) - return n_reads - def get_n_reads_bam(bam_filename): p = sb.Popen("samtools view -c %s" % bam_filename, shell=True, stdout=sb.PIPE) return int(p.communicate()[0]) @@ -611,10 +606,10 @@ def main(): N_READS_INPUT = get_n_reads_bam(args.aligned_pooled_bam) N_READS_AFTER_PREPROCESSING = N_READS_INPUT else: - N_READS_INPUT = get_n_reads_fastq(args.fastq_r1) + N_READS_INPUT = CRISPRessoShared.get_n_reads_fastq(args.fastq_r1) if args.split_interleaved_input: N_READS_INPUT /= 2 - N_READS_AFTER_PREPROCESSING = get_n_reads_fastq(processed_output_filename) + N_READS_AFTER_PREPROCESSING = CRISPRessoShared.get_n_reads_fastq(processed_output_filename) crispresso2_info['running_info']['finished_steps']['count_input_reads'] = (N_READS_INPUT, N_READS_AFTER_PREPROCESSING) CRISPRessoShared.write_crispresso_info( @@ -854,7 +849,7 @@ def main(): n_reads_aligned_amplicons=[] crispresso_cmds = [] for idx, row in df_template.iterrows(): - this_n_reads = get_n_reads_fastq(row['Demultiplexed_fastq.gz_filename']) + this_n_reads = CRISPRessoShared.get_n_reads_fastq(row['Demultiplexed_fastq.gz_filename']) n_reads_aligned_amplicons.append(this_n_reads) info('\n Processing:%s with %d reads'%(idx,this_n_reads)) this_amp_seq = row['amplicon_seq'] @@ -1554,8 +1549,8 @@ def rreplace(s, old, new): #if many reads weren't aligned, print those out for the user if RUNNING_MODE != 'ONLY_GENOME': - #N_READS_INPUT=get_n_reads_fastq(args.fastq_r1) - #N_READS_AFTER_PREPROCESSING=get_n_reads_fastq(processed_output_filename) + #N_READS_INPUT=CRISPRessoShared.get_n_reads_fastq(args.fastq_r1) + #N_READS_AFTER_PREPROCESSING=CRISPRessoShared.get_n_reads_fastq(processed_output_filename) tot_reads_aligned = df_summary_quantification['Reads_aligned'].fillna(0).sum() tot_reads = df_summary_quantification['Reads_total'].sum() diff --git a/CRISPResso2/CRISPRessoShared.py b/CRISPResso2/CRISPRessoShared.py index 15d1b33f..2114f8da 100644 --- a/CRISPResso2/CRISPRessoShared.py +++ b/CRISPResso2/CRISPRessoShared.py @@ -492,6 +492,15 @@ def assert_fastq_format(file_path, max_lines_to_check=100): raise InputFileFormatException('File %s is not in fastq format!' % (file_path)) from e +def get_n_reads_fastq(fastq_filename): + if not os.path.exists(fastq_filename) or os.path.getsize(fastq_filename) == 0: + return 0 + + p = sb.Popen(('z' if fastq_filename.endswith('.gz') else '' ) +"cat < %s | grep -c ." % fastq_filename, shell=True, stdout=sb.PIPE) + n_reads = int(float(p.communicate()[0])/4.0) + return n_reads + + def check_output_folder(output_folder): """ Checks to see that the CRISPResso run has completed, and gathers the amplicon info for that run diff --git a/tests/unit_tests/test_CRISPRessoShared.py b/tests/unit_tests/test_CRISPRessoShared.py index 9916f8a2..3d6855d5 100644 --- a/tests/unit_tests/test_CRISPRessoShared.py +++ b/tests/unit_tests/test_CRISPRessoShared.py @@ -1,4 +1,7 @@ from CRISPResso2 import CRISPResso2Align, CRISPRessoShared +import tempfile +import os +import gzip ALN_MATRIX = CRISPResso2Align.read_matrix('./CRISPResso2/EDNAFULL') @@ -28,6 +31,120 @@ def test_get_relative_coordinates(): assert s1inds_gap_right == [0, 1, 2, 3, 4] +def test_get_n_reads_fastq(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA\n") # Ensure the file content is correct and ends with a newline + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 1 + + # Clean up: delete the file after the test + os.remove(f.name) + +def test_get_n_reads_fastq_gzip(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA\n") # Ensure the file content is correct and ends with a newline + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # gzip file + with open(f.name, 'rb') as f_in, gzip.open(f.name + '.gz', 'wb') as f_out: + f_out.writelines(f_in) + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name + '.gz') == 1 + + # Clean up: delete the file after the test + os.remove(f.name) + os.remove(f.name + '.gz') + + +def test_get_n_reads_fastq_three_extra_newlines(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA\n") # Ensure the file content is correct and ends with a newline + f.write("\n\n") # Ensure the file content is correct and ends with a newline + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 1 + + # Clean up: delete the file after the test + os.remove(f.name) + + +def test_get_n_reads_fastq_four_extra_newlines(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA\n") # Ensure the file content is correct and ends with a newline + f.write("\n\n\n\n\n\n\n\n") # Ensure the file content is correct and ends with a newline + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 1 + + # Clean up: delete the file after the test + os.remove(f.name) + + +def test_get_n_reads_fastq_100_reads(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + for i in range(100): + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA\n") + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 100 + + # Clean up: delete the file after the test + os.remove(f.name) + +def test_get_n_reads_fastq_no_newline(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.write("@SEQ_ID\n") + f.write("GATTACA\n") + f.write("+\n") + f.write("AAAAAAA") # Ensure the file content is correct and ends with a newline + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 1 + + # Clean up: delete the file after the test + os.remove(f.name) + + +def test_get_n_reads_fastq_empty_file(): + with tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.fastq') as f: + f.flush() # Flush the content to disk + os.fsync(f.fileno()) # Ensure all internal buffers associated with the file are written to disk + + # Since the file needs to be read by another process, we ensure it's closed and written before passing it + assert CRISPRessoShared.get_n_reads_fastq(f.name) == 0 + + # Clean up: delete the file after the test + os.remove(f.name) + + def test_get_relative_coordinates_to_gap(): # unaligned sequences seq_1 = 'TTCGT' @@ -98,3 +215,4 @@ def test_get_quant_window_ranges_from_include_idxs_single_gap(): def test_get_quant_window_ranges_from_include_idxs_multiple_gaps(): include_idxs = [50, 51, 52, 53, 55, 56, 57, 58, 60] assert CRISPRessoShared.get_quant_window_ranges_from_include_idxs(include_idxs) == [(50, 53), (55, 58), (60, 60)] + From 09e468f5932d68e59e84860e3d8209dd74b48469 Mon Sep 17 00:00:00 2001 From: Cole Lyman Date: Fri, 11 Oct 2024 15:40:16 -0600 Subject: [PATCH 3/6] Ensure that all demultiplexed fastqs genereated have an ending newline --- CRISPResso2/CRISPRessoPooledCORE.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CRISPResso2/CRISPRessoPooledCORE.py b/CRISPResso2/CRISPRessoPooledCORE.py index f328e405..d1e0f361 100644 --- a/CRISPResso2/CRISPRessoPooledCORE.py +++ b/CRISPResso2/CRISPRessoPooledCORE.py @@ -1127,7 +1127,7 @@ def rreplace(s, old, new): END{ \ if (fastq_filename!="NA") {if (num_records < __MIN_READS__){\ record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\tNA\n"} \ - else{printf("%s",fastq_records)>fastq_filename;close(fastq_filename); system("gzip -f "fastq_filename); record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\t"fastq_filename".gz\n"} \ + else{print(fastq_records)>fastq_filename;close(fastq_filename); system("gzip -f "fastq_filename); record_log_str = record_log_str chr_id"\t"bpstart"\t"bpend"\t"num_records"\t"fastq_filename".gz\n"} \ }\ print record_log_str > "__DEMUX_CHR_LOGFILENAME__" \ }' ''' From f05cc167511d623325a6852e12f8dfe76703b509 Mon Sep 17 00:00:00 2001 From: Cole Lyman Date: Fri, 22 Nov 2024 13:24:37 -0700 Subject: [PATCH 4/6] Add customizable samtools exclude flag (#112) * Add customizable samtools exclude flag This adds a new parameter, `--samtools_exclude_flags` that allows one to customize which reads are excluded from CRISPRessoWGS and CRISPRessoPooled runs. * Allow for user filtering of reads in Pooled when calculating n_aligned * Add parameter for CRISPRessoCORE to exclude reads * Fix WGS not having access to args * Update pooled samtools exclude flags to be bitwise OR and add unit tests * Point to new test branch * Update help message for samtools_exclude_flags * Move integration test branch back to master --- CRISPResso2/CRISPRessoCORE.py | 4 +- CRISPResso2/CRISPRessoPooledCORE.py | 37 ++++++++++++------- CRISPResso2/CRISPRessoWGSCORE.py | 11 +++--- CRISPResso2/args.json | 14 +++++++ tests/unit_tests/test_CRISPRessoPooledCORE.py | 17 +++++++++ 5 files changed, 63 insertions(+), 20 deletions(-) create mode 100644 tests/unit_tests/test_CRISPRessoPooledCORE.py diff --git a/CRISPResso2/CRISPRessoCORE.py b/CRISPResso2/CRISPRessoCORE.py index 6094d40b..ea66e438 100644 --- a/CRISPResso2/CRISPRessoCORE.py +++ b/CRISPResso2/CRISPRessoCORE.py @@ -819,9 +819,9 @@ def process_bam(bam_filename, bam_chr_loc, output_bam, variantCache, ref_names, crispresso_cmd_to_write = ' '.join(sys.argv) sam_out.write('@PG\tID:crispresso2\tPN:crispresso2\tVN:'+CRISPRessoShared.__version__+'\tCL:"'+crispresso_cmd_to_write+'"\n') if bam_chr_loc != "": - proc = sb.Popen(['samtools', 'view', bam_filename, bam_chr_loc], stdout=sb.PIPE, encoding='utf-8') + proc = sb.Popen(['samtools', 'view', '-F', args.samtools_exclude_flags, bam_filename, bam_chr_loc], stdout=sb.PIPE, encoding='utf-8') else: - proc = sb.Popen(['samtools', 'view', bam_filename], stdout=sb.PIPE, encoding='utf-8') + proc = sb.Popen(['samtools', 'view', '-F', args.samtools_exclude_flags, bam_filename], stdout=sb.PIPE, encoding='utf-8') num_reads = 0 # Reading through the bam file and enriching variantCache as a dictionary with the following: diff --git a/CRISPResso2/CRISPRessoPooledCORE.py b/CRISPResso2/CRISPRessoPooledCORE.py index 4435aa54..03341cfb 100644 --- a/CRISPResso2/CRISPRessoPooledCORE.py +++ b/CRISPResso2/CRISPRessoPooledCORE.py @@ -154,12 +154,22 @@ def get_n_reads_bam(bam_filename): p = sb.Popen("samtools view -c %s" % bam_filename, shell=True, stdout=sb.PIPE) return int(p.communicate()[0]) -def get_n_aligned_bam(bam_filename): - p = sb.Popen("samtools view -F 0x904 -c %s" % bam_filename, shell=True, stdout=sb.PIPE) +def calculate_aligned_samtools_exclude_flags(samtools_exclude_flags): + """Calculate the samtools exclude flags for aligned reads. + + This function calculates the samtools exclude flags for aligned reads + by filtering 0x900 (not primary alignment and supplementary alignment) + and also including any other user specified filters. + """ + samtools_exclude_flags = int(samtools_exclude_flags, base=16) + return hex(0x900 | samtools_exclude_flags) + +def get_n_aligned_bam(bam_filename, samtools_exclude_flags): + p = sb.Popen(f"samtools view -F {calculate_aligned_samtools_exclude_flags(samtools_exclude_flags)} -c {bam_filename}", shell=True, stdout=sb.PIPE) return int(p.communicate()[0]) -def get_n_aligned_bam_region(bam_filename, chr_name, chr_start, chr_end): - p = sb.Popen("samtools view -F 0x904 -c %s %s:%d-%d" %(bam_filename, chr_name, chr_start, chr_end), shell=True, stdout=sb.PIPE) +def get_n_aligned_bam_region(bam_filename, chr_name, chr_start, chr_end, samtools_exclude_flags): + p = sb.Popen(f"samtools view -F {calculate_aligned_samtools_exclude_flags(samtools_exclude_flags)} -c {bam_filename} {chr_name}:{chr_start}-{chr_end}", shell=True, stdout=sb.PIPE) return int(p.communicate()[0]) def find_overlapping_genes(row, df_genes): @@ -786,12 +796,13 @@ def main(): info('Alignment command: ' + aligner_command, {'percent_complete': 15}) sb.call(aligner_command, shell=True) - N_READS_ALIGNED = get_n_aligned_bam(bam_filename_amplicons) + N_READS_ALIGNED = get_n_aligned_bam(bam_filename_amplicons, args.samtools_exclude_flags) if args.limit_open_files_for_demux: bam_iter = CRISPRessoShared.get_command_output( - '(samtools sort {bam_file} | samtools view -F 4) 2>> {log_file}'.format( + '(samtools sort {bam_file} | samtools view -F {samtools_exclude_flags}) 2>> {log_file}'.format( bam_file=bam_filename_amplicons, + samtools_exclude_flags=args.samtools_exclude_flags, log_file=log_filename, ), ) @@ -824,7 +835,7 @@ def main(): if curr_file is not None: curr_file.close() else: - s1 = r"samtools view -F 4 %s 2>>%s | grep -v ^'@'" % (bam_filename_amplicons,log_filename) + s1 = rf"samtools view -F {args.samtools_exclude_flags} {bam_filename_amplicons} 2>>{log_filename} | grep -v ^'@'" s2 = r'''|awk '{ gzip_filename=sprintf("gzip >> OUTPUTPATH%s.fastq.gz",$3);\ print "@"$1"\n"$10"\n+\n"$11 | gzip_filename;}' ''' @@ -1014,7 +1025,7 @@ def rreplace(s, old, new): info('Index file for input .bam file does not exist. Generating bam index file.') sb.call('samtools index %s' % bam_filename_genome, shell=True) - N_READS_ALIGNED = get_n_aligned_bam(bam_filename_genome) + N_READS_ALIGNED = get_n_aligned_bam(bam_filename_genome, args.samtools_exclude_flags) # save progress up to this point crispresso2_info['running_info']['finished_steps']['n_reads_aligned_genome'] = N_READS_ALIGNED CRISPRessoShared.write_crispresso_info( @@ -1031,7 +1042,7 @@ def rreplace(s, old, new): sb.call('samtools index %s' % bam_filename_genome, shell=True) - N_READS_ALIGNED = get_n_aligned_bam(bam_filename_genome) + N_READS_ALIGNED = get_n_aligned_bam(bam_filename_genome, args.samtools_exclude_flags) # save progress up to this point crispresso2_info['running_info']['finished_steps']['n_reads_aligned_genome'] = N_READS_ALIGNED @@ -1061,7 +1072,7 @@ def rreplace(s, old, new): # if we should only demultiplex where amplicons aligned... (as opposed to the whole genome) if RUNNING_MODE=='AMPLICONS_AND_GENOME' and not args.demultiplex_genome_wide: - s1 = r'''samtools view -F 0x0004 %s __REGIONCHR__:__REGIONSTART__-__REGIONEND__ 2>>%s |''' % (bam_filename_genome, log_filename)+\ + s1 = rf'''samtools view -F {args.samtools_exclude_flags} {bam_filename_genome} __REGIONCHR__:__REGIONSTART__-__REGIONEND__ 2>>{log_filename} |''' +\ r'''awk 'BEGIN{OFS="\t";num_records=0;fastq_filename="__OUTPUTPATH__REGION___REGIONCHR_____REGIONSTART_____REGIONEND__.fastq";} \ { \ print "@"$1"\n"$10"\n+\n"$11 > fastq_filename; \ @@ -1093,7 +1104,7 @@ def rreplace(s, old, new): else: # next, create the general demux command # variables like __CHR__ will be subbed out below for each iteration - s1 = r'''samtools view -F 0x0004 %s __CHR____REGION__ 2>>%s |''' % (bam_filename_genome, log_filename) + \ + s1 = rf'''samtools view -F {args.samtools_exclude_flags} {bam_filename_genome} __CHR____REGION__ 2>>{log_filename} |''' + \ r'''awk 'BEGIN {OFS="\t"} {bpstart=$4; bpend=bpstart; split ($6,a,"[MIDNSHP]"); n=0;\ for (i=1; i in a; i++){\ n+=1+length(a[i]);\ @@ -1166,13 +1177,13 @@ def rreplace(s, old, new): curr_end = curr_pos + chr_step_size while curr_end < chr_len: # make sure there aren't any reads at this breakpoint - n_reads_at_end = get_n_aligned_bam_region(bam_filename_genome, chr_str, curr_end-5, curr_end+5) + n_reads_at_end = get_n_aligned_bam_region(bam_filename_genome, chr_str, curr_end-5, curr_end+5, args.samtools_exclude_flags) while n_reads_at_end > 0: curr_end += 500 # look for another place with no reads if curr_end >= chr_len: curr_end = chr_len break - n_reads_at_end = get_n_aligned_bam_region(bam_filename_genome, chr_str, curr_end-5, curr_end+5) + n_reads_at_end = get_n_aligned_bam_region(bam_filename_genome, chr_str, curr_end-5, curr_end+5, args.samtools_exclude_flags) chr_output_filename = _jp('MAPPED_REGIONS/%s_%s_%s.info' % (chr_str, curr_pos, curr_end)) sub_chr_command = chr_cmd.replace("__REGION__", ":%d-%d "%(curr_pos, curr_end)).replace("__DEMUX_CHR_LOGFILENAME__",chr_output_filename) diff --git a/CRISPResso2/CRISPRessoWGSCORE.py b/CRISPResso2/CRISPRessoWGSCORE.py index a50b1668..4521f4f4 100644 --- a/CRISPResso2/CRISPRessoWGSCORE.py +++ b/CRISPResso2/CRISPRessoWGSCORE.py @@ -208,7 +208,7 @@ def get_n_reads_fastq(fastq_filename): n_reads = int(float(p.communicate()[0])/4.0) return n_reads -def extract_reads(row): +def extract_reads(row, samtools_exclude_flags): if row.sequence: #create place-holder fastq files open(row.fastq_file_trimmed_reads_in_region, 'w+').close() @@ -217,7 +217,7 @@ def extract_reads(row): info('Extracting reads in:%s and creating .bam file: %s' % (region, row.bam_file_with_reads_in_region)) - cmd=r'''samtools view -b -F 4 --reference %s %s %s > %s ''' % (row.reference_file, row.original_bam, region, row.bam_file_with_reads_in_region) + cmd = rf'''samtools view -b -F {samtools_exclude_flags} --reference {row.reference_file} {row.original_bam} {region} > {row.bam_file_with_reads_in_region} ''' sb.call(cmd, shell=True) cmd=r'''samtools index %s ''' % (row.bam_file_with_reads_in_region) @@ -232,10 +232,10 @@ def extract_reads(row): return row -def extract_reads_chunk(df): +def extract_reads_chunk(df, samtools_exclude_flags): new_df = pd.DataFrame(columns=df.columns) for i in range(len(df)): - new_df.loc[i] = extract_reads(df.iloc[i].copy()) + new_df.loc[i] = extract_reads(df.iloc[i].copy(), samtools_exclude_flags) new_df.set_index(df.index,inplace=True) return new_df @@ -577,7 +577,8 @@ def set_filenames(row): else: #run region extraction here - df_regions = CRISPRessoMultiProcessing.run_pandas_apply_parallel(df_regions, extract_reads_chunk, n_processes_for_wgs) + extract_reads_chunk_partial = lambda x: extract_reads_chunk(x, args.samtools_exclude_flags) + df_regions = CRISPRessoMultiProcessing.run_pandas_apply_parallel(df_regions, extract_reads_chunk_partial, n_processes_for_wgs) df_regions.sort_values('region_number', inplace=True) cols_to_print = ["chr_id", "bpstart", "bpend", "sgRNA", "Expected_HDR", "Coding_sequence", "sequence", "n_reads", "bam_file_with_reads_in_region", "fastq_file_trimmed_reads_in_region"] if args.gene_annotations: diff --git a/CRISPResso2/args.json b/CRISPResso2/args.json index 4f28b1b6..6bb22d5d 100644 --- a/CRISPResso2/args.json +++ b/CRISPResso2/args.json @@ -255,6 +255,20 @@ "default": "None", "tools": ["Core", "Batch", "Pooled", "WGS"] }, + "samtools_exclude_flags": { + "keys": ["--samtools_exclude_flags"], + "help": "Exclude reads with any of the specified flags set in the SAM/BAM file. Flags can be specified in either base 16 (hex) or base 10. Default is 4 (read unmapped).", + "type": "str", + "default": "4", + "tools": ["Pooled", "WGS"] + }, + "samtools_exclude_flags_core": { + "keys": ["--samtools_exclude_flags"], + "help": "Exclude reads with any of the specified flags set in the SAM/BAM file. Flags can be specified in either base 16 (hex) or base 10. Default is 0 (no reads filtered).", + "type": "str", + "default": "0", + "tools": ["Core"] + }, "stringent_flash_merging": { "keys": ["--stringent_flash_merging"], "help": "DEPRECATED in v2.3.0", diff --git a/tests/unit_tests/test_CRISPRessoPooledCORE.py b/tests/unit_tests/test_CRISPRessoPooledCORE.py new file mode 100644 index 00000000..c7bc36af --- /dev/null +++ b/tests/unit_tests/test_CRISPRessoPooledCORE.py @@ -0,0 +1,17 @@ +from CRISPResso2 import CRISPRessoPooledCORE + + +def test_calculate_aligned_samtools_exclude_flags(): + assert CRISPRessoPooledCORE.calculate_aligned_samtools_exclude_flags('0') == hex(0x900) + + +def test_calculate_aligned_samtools_exclude_flags_4(): + assert CRISPRessoPooledCORE.calculate_aligned_samtools_exclude_flags('4') == hex(0x904) + + +def test_calculate_aligned_samtools_exclude_flags_9(): + assert CRISPRessoPooledCORE.calculate_aligned_samtools_exclude_flags('9') == hex(0x909) + + +def test_calculate_aligned_samtools_exclude_flags_0x100(): + assert CRISPRessoPooledCORE.calculate_aligned_samtools_exclude_flags('0x100') == hex(0x900) From 7ddce13b82a6fcb9dcbd0749d34a4228a2ff65d8 Mon Sep 17 00:00:00 2001 From: Cole Lyman Date: Fri, 22 Nov 2024 13:31:14 -0700 Subject: [PATCH 5/6] Remove commented out lines --- CRISPResso2/CRISPRessoPooledCORE.py | 2 -- 1 file changed, 2 deletions(-) diff --git a/CRISPResso2/CRISPRessoPooledCORE.py b/CRISPResso2/CRISPRessoPooledCORE.py index 27fd7c91..797524af 100644 --- a/CRISPResso2/CRISPRessoPooledCORE.py +++ b/CRISPResso2/CRISPRessoPooledCORE.py @@ -1560,8 +1560,6 @@ def rreplace(s, old, new): #if many reads weren't aligned, print those out for the user if RUNNING_MODE != 'ONLY_GENOME': - #N_READS_INPUT=CRISPRessoShared.get_n_reads_fastq(args.fastq_r1) - #N_READS_AFTER_PREPROCESSING=CRISPRessoShared.get_n_reads_fastq(processed_output_filename) tot_reads_aligned = df_summary_quantification['Reads_aligned'].fillna(0).sum() tot_reads = df_summary_quantification['Reads_total'].sum() From 63099e11dbd0fb0368067adc711556a4f00a0a75 Mon Sep 17 00:00:00 2001 From: Cole Lyman Date: Thu, 12 Dec 2024 21:51:00 -0700 Subject: [PATCH 6/6] Move test branch back to master --- .github/workflows/integration_tests.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/integration_tests.yml b/.github/workflows/integration_tests.yml index e58a16ce..64f8f516 100644 --- a/.github/workflows/integration_tests.yml +++ b/.github/workflows/integration_tests.yml @@ -63,7 +63,6 @@ jobs: uses: actions/checkout@v3 with: repository: edilytics/CRISPResso2_tests - ref: "trevor/get_n_reads_fix" # ref: '' # update to specific branch - name: Run Basic