Release 0.5.0 · akikuno/DAJIN2

📝 Documentation

Update the issue template from md to yml and modify it to make it easier for users to fill out each item. [Commit Detail]

Extremely low-frequency alleles (less than 0.05%) are considered Nanopore sequence errors and are not clustered #36.
- Configure clustering.extract_labels so that alleles with a low number of reads (0.05% or fewer or 5 reads or fewer) are not clustered. [Commit Detail]
- Change clustering.clustering to stop if the minimum value of the elements in the cluster is 0.5% or less. [Commit Detail]
- Add consensus.remove_minor_alleles to remove minor alleles with fewer than 5 reads or less than 0.5% [Commit Detail]
Save subsetted fastq of a control sample if the read number is too large (> 10,000 reads). The control will have a maximum of 10,000 reads to avoid excessive computational load. [Commit Detail]
If the read length is 500 bases or less, change the mappy preset to sr. [Commit Detail]
Update extract_best_preset to prioritize map-ont and remove splice preset if inversion is observed. [Commit Detail]
Update the algorithms of cssplits_hander.reallocate_insertion_within_deletion to automate change point detection by incorporating temporal changes. [Commit Detail]

Update deploy_pypi.yml to use the latest version of Actions. Refer to the latest official YAML for guidance. [Commit Detail]
Integrate requirements.txt and MANIFEST.in into pyproject.toml by replacing setup.py [Commit Detail]
Modify to record the execution command of DAJIN2 in the log file [Commit Detail]
Add a test to check if the version in test_version.sh matches the version in pyproject.toml and utils.config [Commit Detail]
Rename consensus.subset_clust to consensus.downsample_by_label to clarify the function's purpose. [Commit Detail]
Update extract_unique_insertions to merge highly similar extracted insertion sequences. [Commit Detail]
- Fix extract_unique_insertions: There is a bug where removing the key twice in fasta_insertions_unique caused the index and key to become misaligned in enumerate(distances) if i != key. Therefore, the removal of keys from fasta_insertions_unique is now done all at once at the end. [Commit Detail]
Add control characters for fastx_handler.sanitize_filename as forbidden chars. [Commit Detail]
Chang the naming convention for the temporary directory: <sample_name>/<process_content>/<allele_name>/(<label_name>)/file_name. Example: flox/consensus/control/1/mutation_loci.pickle. [Commit Detail]
Move sanitze_name function from utils.fastx_handlerto utils.io [Commit Detail]

Remove sam_handler.remove_overlapped_reads to prevent unnecessary trimming of reads. [Commit Detail]
Fix preprocess.insertions_to_fasta.remove_minor_groups to delete the keys (insertion loci) when insertions are removed and result in an empty dict. This prevents errors when accessing non-existent keys in subset_insertions. [Commit Detail]
Fix the bug in cssplits_handler.convert_cssplits_to_cstag where the insertion cs tag is not merged with the next cs tag if they have the same operator (e.g., +A|+A|=T, =T: before: +aa=T=T, after: +aa=TT). [Commit Detail]
Modify the system to separate intermediate files using a directory structure instead of underscores (_), ensuring that no errors occur even if users use allele names containing underscores [Commit Detail]
- Thank you @geedrn for reporting the issue #39!