0.4.1
📝 Documentation
- Added documentation for a new feature in
README.md
: DAJIN2 can now detect complex mutations characteristic of genome editing, such as insertions occurring in regions where deletions have occurred.
🚀 New Features
-
Introduced
cssplits_handler.detect_insertion_within_deletion
to extract insertion sequences within deletions. This addresses cases where minimap2 may align bases that partially match the reference through local alignment, potentially failing to detect them as insertions. This enhancement ensures the proper detection of insertion sequences. Commit Detail -
Added
report.insertion_refractor.py
to include original insertion information in the consensus for mappings made by insertion. This addition enables the listing of both insertions and deletions within the insertion allele on a single HTML file. Commit Detail
🔧 Maintenance
-
Updated
insertions_to_fasta.py
. Commit Detail- Modified the approach to reduce randomness by replacing set or frozenset with list or tuple, and using
random.sample()
for subsetting reads. - Refactored
call_consensus_insertion_sequence
. - Fixed a bug in
extract_score_and_sequence
to ensure correct appending of scores for the insertions_merged_subset.
- Modified the approach to reduce randomness by replacing set or frozenset with list or tuple, and using
-
Changed the function name of
report
to be more explicit. Commit Detail -
Updated
utils.report_report_generator
Commit Detail- Capitalized "Allele" (e.g., control) and "Allele type" (e.g., intact).
- Changed the output format of read_all and read_summary from CSV to XLSX.
- Corrected the order of the Legend to follow a logical sequence from control to sample, and then to specific insertions.
-
Updated
utils.io.read_xlsx
to switch from using pandas to openpyxl due to the DeprecationWarning in Pandas being cumbersome. Commit Detail
🐛 Bug Fixes
-
Added
=
to the prefix for valid cstag recognition when there is ann
in inversion. Commit Detail -
Modified the io.load_from_csv function to trim spaces before and after each field, addressing an error caused by spaces in batch.csv. Commit Detail
⛔️ Deprecated
- Removed
reads_all.csv
. This CSV file, which showed the allele for each read, is no longer reported due to its limited usefulness and because the same information can be obtained from the BAM file. Commit Detail