Skip to content

Commit

Permalink
Merge pull request #14 from akikuno/develop-v0.3.1
Browse files Browse the repository at this point in the history
Develop v0.3.1
  • Loading branch information
akikuno authored Aug 27, 2023
2 parents a1515bb + 4a962ea commit 29a2e09
Show file tree
Hide file tree
Showing 76 changed files with 2,614 additions and 2,032 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ jobs:
- name: Test with pytest
run: |
export PYTHONPATH=./src
python -m pytest tests/ -p no:warnings
python -m pytest tests/ -p no:warnings -vv
3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
include requirements.txt
include src/DAJIN2/template_igvjs.html

graft src/DAJIN2/templates
graft src/DAJIN2/static
graft src/DAJIN2/utils
80 changes: 50 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,65 @@
[![License](https://img.shields.io/badge/License-MIT-9cf.svg?style=flat-square)](https://choosealicense.com/licenses/mit/)
[![PyPI](https://img.shields.io/pypi/v/DAJIN2.svg?label=PyPI&color=orange&style=flat-square)](https://pypi.org/project/DAJIN2/)
[![Test](https://img.shields.io/github/actions/workflow/status/akikuno/dajin2/pytest.yml?branch=main&label=Test&color=brightgreen&style=flat-square)](https://github.com/akikuno/dajin2/actions)
[![Python](https://img.shields.io/pypi/pyversions/DAJIN2.svg?label=Python&color=blue&style=flat-square)](https://pypi.org/project/DAJIN2/)
[![PyPI](https://img.shields.io/pypi/v/DAJIN2.svg?label=PyPI&color=orange&style=flat-square)](https://pypi.org/project/DAJIN2/)
[![Bioconda](https://img.shields.io/conda/v/bioconda/dajin2?label=Bioconda&color=orange&style=flat-square)](https://anaconda.org/bioconda/dajin2)


<p align="center">
<img src="https://user-images.githubusercontent.com/15861316/261833016-7f356960-88cf-4574-87e2-36162b174340.png" width="90%">
</p>

DAJIN2 is a genotyping software designed for organisms that have undergone genome editing, utilizing nanopore sequencing technology.

The name DAJIN is inspired by the term 一網**打尽** (Ichimou **DAJIN** or Yīwǎng **Dǎjìn**), which signifies capturing everything in a single net.

## Disclaimer

DAJIN2 is still in the development phase.
Basic tests covering point mutations, deletions, and insertion designs have been conducted.
If you encounter any bugs or issues, please report them via [Issues](https://github.com/akikuno/DAJIN2/issues).


⚠️ DAJIN2 is currently under development ⚠️

Expected to be available the stable version in August 2023 🤞
## Installation

## Installation (alpha-version)
From [PyPI](https://pypi.org/project/DAJIN2/):

```bash
pip install DAJIN2
```

## Usage
From [Bioconda](https://anaconda.org/bioconda/DAJIN2):

```bash
conda install -c bioconda DAJIN2
```

### Basics

You can run DAJIN2 for a single sample (one sample vs one control)
## Usage

### Single Sample Analysis

DAJIN2 allows for the analysis of single samples (one sample vs one control).

```bash
DAJIN2 [-h] [-s SAMPLE] [-c CONTROL] [-a ALLELE] [-n NAME] [-g GENOME] [-t THREADS] [-v]
DAJIN2 <-s|--sample> <-c|--control> <-a|--allele> <-n|--name> [-g|--genome] [-t|--threads] [-h|--help] [-v|--version]

options:
-h, --help show this help message and exit
-s SAMPLE, --sample SAMPLE
Full path to a sample FASTQ file
-c CONTROL, --control CONTROL
Full path to a control FASTQ file
-a ALLELE, --allele ALLELE
Full path to a FASTA file
-n NAME, --name NAME Output directory name
-g GENOME, --genome GENOME
Reference genome ID (e.g hg38, mm10) [default: '']
-t THREADS, --threads THREADS
Number of threads [default: 1]
-v, --version show program's version number and exit
-s, --sample Path to a sample FASTQ file
-c, --control Path to a control FASTQ file
-a, --allele Path to a FASTA file
-n, --name Output directory name
-g, --genome (Optional) Reference genome ID (e.g hg38, mm39) [default: '']
-t, --threads (Optional) Number of threads [default: 1]
-h, --help show this help message and exit
-v, --version show the version number and exit
```

#### Example

```bash
# Donwload example dataset
# Donwload the example dataset
wget https://github.com/akikuno/DAJIN2/raw/main/examples/example-single.tar.gz
tar -xf example-single.tar.gz

Expand All @@ -68,24 +86,24 @@ DAJIN2 \
# 🎉 Finished! Open DAJINResults/stx2-deletion to see the report.
```

### Batch handling
### Batch Processing

DAJIN2 can also handle multiple FASTQ files using the `batch` subcommand.

DAJIN2 can handle many FASTQ files using the `batch' subcommand.

```bash
DAJIN2 batch [-h] -f FILE [-t THREADS]
DAJIN2 batch <-f|--file> [-t|--threads] [-h]

options:
-h, --help Show this help message and exit
-f FILE, --file FILE CSV or Excel file
-t THREADS, --threads THREADS
Number of threads [default: 1]
-f, --file Path to a CSV or Excel file
-t, --threads (Optional) Number of threads [default: 1]
-h, --help Show this help message and exit
```

#### Example

```bash
# Donwload example dataset
# Donwload the example dataset
wget https://github.com/akikuno/DAJIN2/raw/main/examples/example-batch.tar.gz
tar -xf example-batch.tar.gz

Expand Down Expand Up @@ -122,4 +140,6 @@ DAJIN2 batch --file example-batch/batch.csv --threads 3

## References

For more information, please refer to the following publication:

[Kuno A, et al. (2022) DAJIN enables multiplex genotyping to simultaneously validate intended and unintended target genome editing outcomes. *PLoS Biology* 20(1): e3001507.](https://doi.org/10.1371/journal.pbio.3001507)
8 changes: 6 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
numpy >= 1.20.0
scipy >= 1.6.0
pandas >= 1.0.0
openpyxl >= 3.0.0
rapidfuzz >=3.0.0
statsmodels >= 0.13.5
scikit-learn >= 1.0.0

mappy >= 2.24
pysam >= 0.19.0
openpyxl >= 3.0.0

Flask >= 2.2.0
waitress >= 2.1.0
Jinja2 >= 3.1.0

plotly >= 5.0.0
kaleido >= 0.2.0

cstag == 0.4.1
midsv >= 0.10.1
wslPath >=0.3.0
rapidfuzz >=3.0.0
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

setuptools.setup(
name="DAJIN2",
version="0.3.0",
version="0.3.1b4",
author="Akihiro Kuno",
author_email="[email protected]",
description="One-step genotyping tools for targeted long-read sequencing",
Expand All @@ -24,7 +24,8 @@
entry_points={"console_scripts": ["DAJIN2=DAJIN2.main:execute"]},
include_package_data=True,
classifiers=[
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Environment :: Console",
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: POSIX",
Expand Down
2 changes: 1 addition & 1 deletion src/DAJIN2/core/classification/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from DAJIN2.core.classification.classify import classify_alleles
from DAJIN2.core.classification.classifier import classify_alleles
49 changes: 49 additions & 0 deletions src/DAJIN2/core/classification/classifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
from __future__ import annotations

import midsv
from pathlib import Path
from itertools import groupby


def _calc_match(CSSPLIT: str) -> float:
match_score = CSSPLIT.count("=")
match_score -= CSSPLIT.count("+") # insertion
match_score -= sum(cs.islower() for cs in CSSPLIT) # inversion
cssplit = CSSPLIT.split(",")

return match_score / len(cssplit)


def _score_allele(TEMPDIR: Path, allele: str, SAMPLE_NAME: str) -> list[dict]:
midsv_sample = midsv.read_jsonl(Path(TEMPDIR, SAMPLE_NAME, "midsv", f"{allele}.json"))
scored_alleles = []

for dict_midsv in midsv_sample:
score = _calc_match(dict_midsv["CSSPLIT"])
dict_midsv.update({"SCORE": score, "ALLELE": allele})
scored_alleles.append(dict_midsv)

return scored_alleles


def _extract_alleles_with_max_score(score_of_each_alleles: list[dict]) -> list[dict]:
alleles_with_max_score = []
score_of_each_alleles.sort(key=lambda x: x["QNAME"])
for _, group in groupby(score_of_each_alleles, key=lambda x: x["QNAME"]):
max_read = max(group, key=lambda x: x["SCORE"])
del max_read["SCORE"]
alleles_with_max_score.append(max_read)
return alleles_with_max_score


##########################################################
# main
##########################################################


def classify_alleles(TEMPDIR: Path, FASTA_ALLELES: dict, SAMPLE_NAME: str) -> list[dict]:
score_of_each_alleles = []
for allele in FASTA_ALLELES:
score_of_each_alleles.extend(_score_allele(TEMPDIR, allele, SAMPLE_NAME))

return _extract_alleles_with_max_score(score_of_each_alleles)
42 changes: 0 additions & 42 deletions src/DAJIN2/core/classification/classify.py

This file was deleted.

16 changes: 5 additions & 11 deletions src/DAJIN2/core/clustering/clustering.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from __future__ import annotations

import json
import pickle
import midsv
import random
Expand All @@ -12,6 +11,7 @@
from DAJIN2.core.clustering.make_kmer import generate_mutation_kmers
from DAJIN2.core.clustering.make_score import make_score
from DAJIN2.core.clustering.return_labels import return_labels
from DAJIN2.utils import io


def annotate_score(path_sample, mutation_score, mutation_loci, is_control=False) -> Generator[list[float]]:
Expand Down Expand Up @@ -41,12 +41,6 @@ def reorder_labels(labels: list[int], start: int = 0) -> list[int]:
return labels_ordered


def write_json(filepath: Path | str, data: Generator) -> None:
with open(filepath, "w") as f:
for line in data:
f.write(json.dumps(line) + "\n")


###########################################################
# main
###########################################################
Expand All @@ -63,7 +57,7 @@ def is_strand_bias(path_control) -> bool:
return True


def add_labels(classif_sample, TEMPDIR, SAMPLE_NAME, CONTROL_NAME, THREADS: int = 1) -> list[dict[str]]:
def add_labels(classif_sample, TEMPDIR, SAMPLE_NAME, CONTROL_NAME) -> list[dict[str]]:
labels_all = []
max_label = 0
strand_bias = is_strand_bias(Path(TEMPDIR, CONTROL_NAME, "midsv", "control.json"))
Expand All @@ -86,14 +80,14 @@ def add_labels(classif_sample, TEMPDIR, SAMPLE_NAME, CONTROL_NAME, THREADS: int
continue
path_sample = Path(TEMPDIR, SAMPLE_NAME, "clustering", f"{allele}_{RANDOM_NUM}.json")
path_control = Path(TEMPDIR, CONTROL_NAME, "midsv", f"{allele}.json")
write_json(path_sample, group)
io.write_jsonl(data=group, path=path_sample)
mutation_score: list[dict[str, float]] = make_score(path_sample, path_control, mutation_loci, knockin_loci)
scores_sample = annotate_score(path_sample, mutation_score, mutation_loci)
scores_control = annotate_score(path_control, mutation_score, mutation_loci, is_control=True)
path_score_sample = Path(TEMPDIR, SAMPLE_NAME, "clustering", f"{allele}_score_{RANDOM_NUM}.json")
path_score_control = Path(TEMPDIR, CONTROL_NAME, "clustering", f"{allele}_score_{RANDOM_NUM}.json")
write_json(path_score_sample, scores_sample)
write_json(path_score_control, scores_control)
io.write_jsonl(data=scores_sample, path=path_score_sample)
io.write_jsonl(data=scores_control, path=path_score_control)
labels = return_labels(path_score_sample, path_score_control, path_sample, strand_bias)
labels_reorder = reorder_labels(labels, start=max_label)
max_label = max(labels_reorder)
Expand Down
9 changes: 4 additions & 5 deletions src/DAJIN2/core/consensus/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from DAJIN2.core.consensus.consensus import call_consensus
from DAJIN2.core.consensus.consensus import call_allele_name
from DAJIN2.core.consensus.consensus import update_key_by_allele_name
from DAJIN2.core.consensus.consensus import add_key_by_allele_name
from DAJIN2.core.consensus.subset import subset_clust
from DAJIN2.core.consensus.extract_mutation_loci_by_labels import extract_mutation_loci_by_labels
from DAJIN2.core.consensus.name_handler import call_allele_name
from DAJIN2.core.consensus.name_handler import update_key_by_allele_name
from DAJIN2.core.consensus.name_handler import add_key_by_allele_name
from DAJIN2.core.consensus.clust_subsetter import subset_clust
File renamed without changes.
Loading

0 comments on commit 29a2e09

Please sign in to comment.