Skip to content

Commit

Permalink
Remove reads_all.csv: We used to output a CSV file reads_all.csv
Browse files Browse the repository at this point in the history
…that showed the allele for each read, but we have decided not to report it anymore due to its limited usefulness and because the same information can be obtained from the BAM file.
  • Loading branch information
akikuno committed Feb 13, 2024
1 parent 821f06f commit 76e3eae
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 19 deletions.
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ The name DAJIN is derived from the phrase 一網**打尽** (Ichimou **DAJIN** or
## 🌟 Features

+ **Comprehensive Mutation Detection**: Equipped with the capability to detect genome editing events over a wide range, it can identify a broad spectrum of mutations, from small changes to large structural variations.
+ DAJIN2 is also possible to detect complex mutations characteristic of genome editing, such as "insertions occurring in regions where deletions have occurred."
+ **Intuitive Visualization**: The outcomes of genome editing are visualized intuitively, allowing for the rapid and easy identification and analysis of mutations.
+ **Multi-Sample Compatibility**: Accommodates a variety of samples, enabling simultaneous processing of multiple samples. This facilitates efficient progression of large-scale experiments and comparative studies.

Expand Down Expand Up @@ -253,10 +254,9 @@ DAJIN_Results/tyr-substitution
│ ├── tyr_c230gt_01%.csv
│ ├── tyr_c230gt_10%.csv
│ └── tyr_c230gt_50%.csv
├── read_all.csv
├── read_plot.html
├── read_plot.pdf
└── read_summary.csv
└── read_summary.xlsx
```

### 1. BAM
Expand Down Expand Up @@ -285,23 +285,22 @@ An example of a Tyr point mutation is described by its position on the chromosom
### 4. read_plot.html and read_plot.pdf

Both read_plot.html and read_plot.pdf illustrate the proportions of each allele.
The chart's **Allele type** indicates the type of allele, and **% of reads** shows the proportion of reads for that allele.
The chart's **Allele type** indicates the type of allele, and **Percent of reads** shows the proportion of reads for that allele.

Additionally, the types of **Allele type** include:
- **intact**: Alleles that perfectly match the input FASTA allele.
- **indels**: Substitutions, deletions, insertions, or inversions within 50 bases.
- **sv**: Substitutions, deletions, insertions, or inversions beyond 50 bases.
- **Intact**: Alleles that perfectly match the input FASTA allele.
- **Indels**: Substitutions, deletions, insertions, or inversions within 50 bases.
- **SV**: Substitutions, deletions, insertions, or inversions beyond 50 bases.

<img src="https://user-images.githubusercontent.com/15861316/274521067-4d217251-4c62-4dc9-9c05-7f5377dd3025.png" width="75%">

> [!WARNING]
> In PCR amplicon sequencing, the % of reads might not match the actual allele proportions due to amplification bias.
> Especially when large deletions are present, the deletion alleles might be significantly amplified, potentially not reflecting the actual allele proportions.
### 5. read_all.csv and read_summary.csv
### 5. read_summary.xlsx

- read_all.csv: Records which allele each read is classified under.
- read_summary.csv: Describes the number of reads and presence proportion for each allele.
- read_summary.xlsx: Describes the number of reads and presence proportion for each allele.

## 📣Feedback and Support

Expand Down
15 changes: 7 additions & 8 deletions docs/README_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ DAJIN2は、ナノポアシーアターゲットシーケンシングを用い
## 🌟 特徴

+ **網羅的な変異検出**: ゲノム編集イベントを広範囲にわたり検出する能力を備えており、小さな変異から大きな構造変化まで、幅広い変異を特定することが可能です
+ ゲノム編集に特徴的な「欠失が生じた領域に挿入が起こる」といった複合的な変異の検出も可能です
+ **直観的な可視化**: ゲノム編集の結果は直観的に可視化され、変異を迅速かつ容易に識別し、分析することができます
+ **多サンプル対応**: 多様なサンプルに対応しており、複数のサンプルを同時に処理することが可能です。これにより、大規模な実験や比較研究を効率的に進めることができます

Expand Down Expand Up @@ -256,10 +257,9 @@ DAJIN_Results/tyr-substitution
│ ├── tyr_c230gt_01%.csv
│ ├── tyr_c230gt_10%.csv
│ └── tyr_c230gt_50%.csv
├── read_all.csv
├── read_plot.html
├── read_plot.pdf
└── read_summary.csv
└── read_summary.xlsx
```

### 1. BAM
Expand Down Expand Up @@ -293,13 +293,13 @@ Tyr点変異の例を以下に示します:
### 4. read_plot.html / read_plot.pdf

read_plot.html および read_plot.pdf は、各アレルの割合を図示しています。
図中の**Allele type**はアレルの種類を、**% of reads**は該当するリードのアレル割合を示しています。
図中の**Allele type**はアレルの種類を、**Percent of reads**は該当するリードのアレル割合を示しています。

また、**Allele type**の種類は以下の通りです:

- **intact**:入力のFASTAアレルと完全に一致するアレル
- **indels**:50塩基以内の置換、欠失、挿入、逆位
- **sv**:50塩基以上の置換、欠失、挿入、逆位
- **Intact**:入力のFASTAアレルと完全に一致するアレル
- **Indels**:50塩基以内の置換、欠失、挿入、逆位を含むアレル
- **SV**:50塩基以上の置換、欠失、挿入、逆位を含むアレル


<img src="https://user-images.githubusercontent.com/15861316/274521067-4d217251-4c62-4dc9-9c05-7f5377dd3025.png" width="75%">
Expand All @@ -308,9 +308,8 @@ read_plot.html および read_plot.pdf は、各アレルの割合を図示し
> PCRアンプリコンを用いたターゲットシーケンシングでは、増幅バイアスのため **% of reads**が実際のアレルの割合と一致しないことがあります。
> とくに大型欠失が存在する場合、欠失アレルが顕著に増幅されることから、実際のアレル割合を反映しない可能性が高まります。
### 5. read_all.csv / read_summary.csv
### 5. read_summary.xlsx

- read_all.csv:各リードがどのアレルに分類されたかが記録されています。
- read_summary.csv:各アレルのリード数と存在割合が記述されています。


Expand Down
3 changes: 1 addition & 2 deletions src/DAJIN2/utils/report_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ def output_plot(results_summary: list[dict[str, str]], report_directory: Path):
# if kaleido is installed, output a pdf
try:
fig.write_image(f"{output_filename}.pdf")
except ValueError:
except Exception:
pass


Expand All @@ -129,7 +129,6 @@ def report(NAME: str) -> None:
results_summary = summarize_info(results_all)

# Write to Excel
io.write_xlsx(results_all, Path(report_directory, "read_all.xlsx"))
io.write_xlsx(results_summary, Path(report_directory, "read_summary.xlsx"))

# Write to plot as HTML and PDF
Expand Down

0 comments on commit 76e3eae

Please sign in to comment.