Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.0 #21

Merged
merged 113 commits into from
Mar 31, 2024
Merged

v1.0 #21

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
c858819
Change the default behavior of bean-qc to not remove the replicates e…
jykr Mar 25, 2024
bfdf121
allow & test feeding no --allele-df-key for tiling run
jykr Mar 25, 2024
cdfad7a
udpate readme to include read structure schematic
jykr Mar 25, 2024
83c3e38
add testing for bean-create-samples
jykr Mar 25, 2024
fb5aaf6
add testing for bean-create-samples
jykr Mar 25, 2024
94954c6
Allow R2 adapter
jykr Mar 26, 2024
5217e84
Debug barcode start site
jykr Mar 26, 2024
3f85e25
debug negative strand gene translation
jykr Mar 27, 2024
755816a
require column for tiling screens only
jykr Mar 27, 2024
657d595
require chrom column for tiling screens only
jykr Mar 27, 2024
87a3523
fix translation steps
jykr Mar 28, 2024
97d464e
fix allele-df-key for added indel fitlering
jykr Mar 28, 2024
25c0a1b
Unify control condition arguments, add sanity check for control condi…
jykr Mar 28, 2024
e938d8c
debug argument name
jykr Mar 28, 2024
82773ad
change arg default value
jykr Mar 28, 2024
1f64961
fix documentation of time
jykr Mar 28, 2024
368ba55
change to replicate
jykr Mar 29, 2024
a93bc83
debug survival negctrl model
jykr Mar 29, 2024
c4df38a
update documentation
jykr Mar 30, 2024
ce22f2d
test doc
jykr Mar 30, 2024
4f11b82
test doc
jykr Mar 30, 2024
d7eaa05
move files
jykr Mar 30, 2024
2bf56a7
add Sphinx dependencies
jykr Mar 30, 2024
f5ee597
add Sphinx dependencies
jykr Mar 30, 2024
4be0796
fix main parser
jykr Mar 30, 2024
9982896
work under docs/
jykr Mar 30, 2024
3cf9777
work under docs/
jykr Mar 30, 2024
995faed
move requirements
jykr Mar 30, 2024
e7e22c7
update test arguments
jykr Mar 30, 2024
932dba4
try sphinx doc
jykr Mar 30, 2024
4338d5e
try sphinx doc edit
jykr Mar 30, 2024
c488dd4
fix doc
jykr Mar 30, 2024
426a4f8
fix command
jykr Mar 30, 2024
232a25b
Update documentation.yml
jykr Mar 30, 2024
085fee2
Update documentation.yml
jykr Mar 30, 2024
6dd11e1
Update documentation.yml
jykr Mar 30, 2024
882271a
changed
jykr Mar 30, 2024
7996d2b
try torch 2
jykr Mar 30, 2024
31a79f7
try torch 2
jykr Mar 30, 2024
2d983fc
move parser to not require import
jykr Mar 30, 2024
22858cd
remove bean dependency
jykr Mar 30, 2024
f026a07
update dependency
jykr Mar 30, 2024
90683be
update dependency
jykr Mar 30, 2024
b529340
update dependency
jykr Mar 30, 2024
920d937
remove dependency of distutils
jykr Mar 30, 2024
e241fa9
remove 3.8 requirement
jykr Mar 30, 2024
59dc74e
deploy
jykr Mar 30, 2024
affb805
change deploy dif
jykr Mar 30, 2024
c778ac0
update directory
jykr Mar 30, 2024
5d722a1
require pyro version that supports Torch2
jykr Mar 30, 2024
b5a7fb4
try different working directory
jykr Mar 30, 2024
0abcb2f
try different working directory
jykr Mar 30, 2024
98b748f
try different working directory
jykr Mar 30, 2024
3a292d7
try push
jykr Mar 30, 2024
e8a6df1
Require Pyro-ppl >= 1.8.5
jykr Mar 30, 2024
4949246
fix push
jykr Mar 30, 2024
3c88eb1
change scope of checkout
jykr Mar 30, 2024
86aa671
Documentation generated
your-username Mar 30, 2024
b954a94
remove incorrect import
jykr Mar 30, 2024
2519ccf
deploy artifact
jykr Mar 30, 2024
661bb43
Merge branch 'doc' of https://github.com/pinellolab/crispr-bean into doc
jykr Mar 30, 2024
a3fcae3
add permission
jykr Mar 30, 2024
15c7cc2
change name of the artifact
jykr Mar 30, 2024
45e5ebb
unify version
jykr Mar 30, 2024
4e9d6ba
add permission
jykr Mar 30, 2024
be458a2
edit
jykr Mar 30, 2024
4ba1960
use correct page artifact
jykr Mar 30, 2024
c7beb55
roll back to deploy-pages v3
jykr Mar 30, 2024
d29be47
return state instead of paramStore
jykr Mar 30, 2024
f075ba0
back to v4
jykr Mar 30, 2024
78295c3
remove build file
jykr Mar 30, 2024
65ad2cb
remove duplicate index
jykr Mar 30, 2024
8314104
use separate save and param dict
jykr Mar 30, 2024
86e910a
fix key for paramStore writing
jykr Mar 30, 2024
1dc0593
add images to docs
jykr Mar 30, 2024
fb92a65
add negctrl as separate arg
jykr Mar 30, 2024
cde6b27
add doc
jykr Mar 30, 2024
8fec7ea
add doc
jykr Mar 30, 2024
076679e
change title
jykr Mar 30, 2024
a51d1ba
fix unbounderror
jykr Mar 30, 2024
d242944
Update README.md
jykr Mar 30, 2024
360acb3
add data structure
jykr Mar 30, 2024
620a855
add data structure
jykr Mar 30, 2024
1dcc3ca
Merge branch 'doc' of https://github.com/pinellolab/crispr-bean into doc
jykr Mar 30, 2024
148957a
change images
jykr Mar 30, 2024
8ea81a2
update docs
jykr Mar 30, 2024
16e3b5d
add tutorial for survival screen
jykr Mar 30, 2024
1174c70
remove copying img directory
jykr Mar 30, 2024
217a317
remove copying img directory
jykr Mar 30, 2024
78193f7
add copy again
jykr Mar 30, 2024
d9445ab
change copy dir
jykr Mar 30, 2024
2ac37dc
add missing files
jykr Mar 30, 2024
03d093e
prioritize rst
jykr Mar 30, 2024
a099341
disamgibuagte names
jykr Mar 30, 2024
d29e73d
remove old files
jykr Mar 30, 2024
c3cee14
remove old files
jykr Mar 30, 2024
4bb61aa
add intro
jykr Mar 30, 2024
749df5f
add image
jykr Mar 30, 2024
0140a47
add test data
jykr Mar 30, 2024
05899a8
fix filtering qc
jykr Mar 30, 2024
f52937c
Fix QC notebook
jykr Mar 30, 2024
4d4791d
change intro message
jykr Mar 30, 2024
6efb8d3
fix qc error message
jykr Mar 30, 2024
2ecc9b5
fix typo
jykr Mar 30, 2024
24ad4a4
Merge pull request #20 from pinellolab/doc
jykr Mar 30, 2024
51e2351
commit before merge
jykr Mar 30, 2024
b976a09
Merge branch 'main' into dev
jykr Mar 30, 2024
39f3286
commit before pull
jykr Mar 30, 2024
7bb40e1
resolve merge conflict
jykr Mar 30, 2024
2ae0192
add docs
jykr Mar 30, 2024
d89191a
add files
jykr Mar 30, 2024
1f9a879
add files before pull
jykr Mar 30, 2024
f3a6932
remove unnecessary files
jykr Mar 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/CI.yml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.8'
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install torch==1.12.1+cpu torchvision==0.13.1+cpu torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cpu
pip install torch torchvision torchaudio
pip install -r requirements.txt
pip install -e .
- name: Test with pytest
Expand Down
41 changes: 41 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: "Sphinx: Render docs"

on: push

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.x'

- name: Install Sphinx & Dependencies
run: |
pip install sphinx sphinx_markdown_builder sphinx_rtd_theme sphinx-argparse m2r pandas bio
sudo apt-get install python3-distutils
- name: Build Documentation
working-directory: docs
run: sphinx-build . _build
- name: copy image files
run: cp -r docs/assets docs/_build/
- uses: actions/upload-pages-artifact@v3
with:
name: github-pages
path: docs/_build/

deploy:
needs: build
permissions:
id-token: write
pages: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.pyc
*.fastq
Empty file modified LICENSE
100644 → 100755
Empty file.
Empty file modified MANIFEST.in
100644 → 100755
Empty file.
452 changes: 37 additions & 415 deletions README.md
100644 → 100755

Large diffs are not rendered by default.

Empty file modified bean/__init__.py
100644 → 100755
Empty file.
Empty file modified bean/annotate/__init__.py
100644 → 100755
Empty file.
17 changes: 10 additions & 7 deletions bean/annotate/_supporting_fn.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from copy import deepcopy
from typing import List, Tuple
from typing import List, Tuple, Union
from tqdm.auto import tqdm
from ..framework.Edit import Edit, Allele
from ..framework.AminoAcidEdit import CodingNoncodingAllele
Expand Down Expand Up @@ -43,14 +43,18 @@ def filter_allele_by_pos(
def filter_allele_by_base(
allele: Allele,
allowed_base_changes: List[Tuple] = None,
allowed_ref_base: str = None,
allowed_alt_base: str = None,
allowed_ref_base: Union[List, str] = None,
allowed_alt_base: Union[List, str] = None,
):
"""
Filter alleles based on position and return the filtered allele and
number of filtered edits.
"""
filtered_edits = 0
if isinstance(allowed_ref_base, str):
allowed_ref_base = [allowed_ref_base]
if isinstance(allowed_alt_base, str):
allowed_alt_base = [allowed_alt_base]
if (
not (allowed_ref_base is None and allowed_alt_base is None)
+ (allowed_base_changes is None)
Expand All @@ -64,15 +68,15 @@ def filter_allele_by_base(
allele.edits.remove(edit)
elif not allowed_ref_base is None:
for edit in allele.edits.copy():
if edit.ref_base != allowed_ref_base:
if edit.ref_base not in allowed_ref_base:
filtered_edits += 1
allele.edits.remove(edit)
elif not allowed_alt_base is None and edit.alt_base != allowed_alt_base:
elif not allowed_alt_base is None and edit.alt_base not in allowed_alt_base:
filtered_edits += 1
allele.edits.remove(edit)
else:
for edit in allele.edits.copy():
if edit.alt_base != allowed_alt_base:
if edit.alt_base not in allowed_alt_base:
filtered_edits += 1
allele.edits.remove(edit)
return (allele, filtered_edits)
Expand Down Expand Up @@ -145,7 +149,6 @@ def _map_alleles_to_filtered(
raw_allele_counts.groupby("guide"),
desc="Mapping alleles to closest filtered alleles",
):

guide_filtered_allele_counts = filtered_allele_counts.loc[
filtered_allele_counts.guide == guide, :
].set_index(allele_col)
Expand Down
Empty file modified bean/annotate/filter_alleles.py
100644 → 100755
Empty file.
Empty file modified bean/annotate/ldlr_exons.fa
100644 → 100755
Empty file.
97 changes: 72 additions & 25 deletions bean/annotate/translate_allele.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from doctest import set_unittest_reportflags
import os
from typing import List, Iterable, Dict, Tuple, Collection
from typing import List, Iterable, Dict, Tuple, Collection, Sequence
from copy import deepcopy
import numpy as np
import pandas as pd
Expand Down Expand Up @@ -169,24 +170,24 @@ def get_cds_seq_pos_from_fasta(fasta_file_name: str) -> Tuple[List[str], List[in
return (exon_chrom, translated_seq, genomic_pos, strand)


def _translate_single_codon(nt_seq_string: str, aa_pos: int) -> str:
def _translate_single_codon(
codon: List[str],
) -> str: # nt_seq_string: str, aa_pos: int) -> str:
"""Translate `aa_pos`-th codon of `nt_seq_string`."""
codon = "".join(nt_seq_string[aa_pos * 3 : (aa_pos * 3 + 3)])
if len(codon) != 3:
print("reached the end of CDS, frameshift.")
return "/"
raise ValueError("reached the end of CDS, frameshift.")
return ">"
try:
codon = "".join(codon)
return codon_map[codon]
except KeyError:
if codon[-1] == "N" and codon[0] in BASE_SET and codon[1] in BASE_SET:
aa_set = {codon_map[codon[:2] + N] for N in BASE_SET}
if len(aa_set) == 1:
return next(iter(aa_set))
print(f"warning: no matching aa with codon {codon}")
raise ValueError(f"warning: no matching aa with codon {codon}")
else:
print(f"Cannot translate codon due to ambiguity: {codon}")

return "_"
raise ValueError(f"Cannot translate codon due to ambiguity: {codon}")


class CDS:
Expand All @@ -196,14 +197,22 @@ class CDS:
def __init__(self):
self.edited_aa_pos = set()
self.edits_noncoding = set()
self.edited_nt: List[str] = []
self.nt: List[str] = []
self.strand: int = 1
self.gene_name: str = ""
self.chrom: str = ""
self.translated_seq: List[str] = []
self.pos: np.ndarray = None
self.genomic_pos: Sequence = []

@classmethod
def from_fasta(cls, fasta_file_name, gene_name, suppressMessage=True):
cds = cls()
if fasta_file_name is None:
if not suppressMessage:
print("No fasta file provided as reference: using LDLR")
fasta_file_name = os.path.dirname(be.__file__) + "/annotate/ldlr_exons.fa"
# if fasta_file_name is not None:
# if not suppressMessage:
# raise ValueError("No fasta file provided as reference: using LDLR")
# fasta_file_name = os.path.dirname(be.__file__) + "/annotate/ldlr_exons.fa"
if gene_name not in cls.gene_info_dict:
chrom, translated_seq, genomic_pos, strand = get_cds_seq_pos_from_fasta(
fasta_file_name
Expand Down Expand Up @@ -242,28 +251,36 @@ def from_gene_name(cls, gene_name, ref_version: str = "GRCh38"):
cds.chrom = cls.gene_info_dict[gene_name]["chrom"]
cds.translated_seq = deepcopy(cls.gene_info_dict[gene_name]["translated_seq"])
cds.genomic_pos = cls.gene_info_dict[gene_name]["genomic_pos"]
cds.nt = cds.gene_info_dict[gene_name]["translated_seq"]
cds.nt = cds.gene_info_dict[gene_name]["translated_seq"] # in sense strand
# print(cds.gene_name + ":" + "".join(cds.nt))
# if cds.strand == -1:
# cds.nt = revcomp(cds.nt)
cds.pos = np.array(cds.genomic_pos)
cds.edited_nt = deepcopy(cds.nt)
return cds

def translate(self):
if self.strand == -1:
self.edited_nt = revcomp(self.edited_nt)
self.aa = _translate(self.edited_nt, codon_map)
# def translate(self):
# if self.strand == -1:
# self.edited_nt = revcomp(self.edited_nt)
# self.aa = _translate(self.edited_nt, codon_map)

def _get_relative_nt_pos(self, absolute_pos):
"""0-based relative position"""
nt_relative_pos = np.where(self.pos == absolute_pos)[0]
assert len(nt_relative_pos) <= 1, nt_relative_pos
return nt_relative_pos.astype(int).item() if nt_relative_pos else -1

def _edit_pos_to_aa_pos(self, edit_pos):
"""0-based nt position. Adds in sense direction, needs to be reversed for antisense gene"""
nt_relative_pos = self._get_relative_nt_pos(edit_pos)
if nt_relative_pos != -1:
self.edited_aa_pos.add(nt_relative_pos // 3)

return nt_relative_pos

def edit_single(self, edit_str):
"""Add a mutation induced by a single `edit_str` to CDS.
For the negative CDS, nt and edited_nt are antisense."""
edit = Edit.from_str(edit_str)
rel_pos = self._edit_pos_to_aa_pos(edit.pos)
if edit.strand == "-":
Expand All @@ -285,13 +302,16 @@ def edit_single(self, edit_str):
)
)
raise RefBaseMismatchException(
f"{self.gene_name + ';' if hasattr(self, 'gene_name') else ''}ref:{self.nt[rel_pos]} at pos {rel_pos}, got edit {edit}"
f"{self.gene_name + ';' if hasattr(self, 'gene_name') else ''}ref:{self.nt[rel_pos]} at pos {rel_pos}, got edit {edit}."
)
else:
self.edited_nt[rel_pos] = alt_base
if alt_base == "-": # frameshift
# self.edited_nt.pop(rel_pos)
self.edited_aa_pos.update(list(range(rel_pos, len(self.nt) // 3)))
if self.strand == 1:
self.edited_aa_pos.update(list(range(rel_pos, len(self.nt) // 3)))
else:
self.edited_aa_pos.update(list(range(0, rel_pos)))

def edit_allele(self, allele_str):
if isinstance(allele_str, Allele):
Expand All @@ -301,7 +321,13 @@ def edit_allele(self, allele_str):
for edit_str in edit_strs:
self.edit_single(edit_str)
if "-" in self.edited_nt:
self.edited_nt.remove("-")
self.edited_nt = [nt for nt in self.edited_nt if nt != "-"]
if self.strand == -1:
# Reverse logged edited positions as it was in the sense direction.
# rev_pos = self.edited_aa_pos[::-1]
self.edited_aa_pos = {len(self.nt) // 3 - 1 - r for r in self.edited_aa_pos}
self.nt = revcomp(self.nt)
self.edited_nt = revcomp(self.edited_nt)

def get_aa_change(
self, allele_str, include_synonymous=True
Expand All @@ -311,10 +337,31 @@ def get_aa_change(
mutations = CodingNoncodingAllele()
mutations.nt_allele.update(self.edits_noncoding)
for edited_aa_pos in self.edited_aa_pos:
ref_aa = _translate_single_codon(self.nt, edited_aa_pos)
mt_aa = _translate_single_codon(self.edited_nt, edited_aa_pos)
if mt_aa == "_":
return "translation error"
try:
# print("".join(self.nt))
# print((3 * edited_aa_pos), (3 * edited_aa_pos + 3))
ref_aa = _translate_single_codon(
self.nt[(3 * edited_aa_pos) : (3 * edited_aa_pos + 3)]
)
except ValueError as e:
print(
f"Translation mismatch in translating ref for {allele_str}: {e}. Check the input .fasta or genome version used for the reporter. Ignoring this allele."
)
continue
try:
# print("".join(self.nt))
# print(self.edited_nt[(3 * edited_aa_pos) : (3 * edited_aa_pos + 3)])
mt_aa = _translate_single_codon(
self.edited_nt[(3 * edited_aa_pos) : (3 * edited_aa_pos + 3)]
)
except ValueError as e:
print(
f"Translation mismatch in translating mutated gene for {allele_str}: {e}. Check the input .fasta or genome version used for the reporter. Ignoring this allele."
)
continue
except IndexError as e:
print(f"End of gene reached by frameshift {allele_str}: {e}")
mt_aa = ">"
if not include_synonymous and ref_aa == mt_aa:
continue
mutations.aa_allele.add(
Expand Down
40 changes: 20 additions & 20 deletions bean/annotate/utils.py
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
debug = logging.debug
info = logging.info

complement_base = {"A": "T", "T": "A", "C": "G", "G": "C"}
complement_base = {"A": "T", "T": "A", "C": "G", "G": "C", "-": "-"}


def revcomp(nt_list: List[str]):
Expand Down Expand Up @@ -97,7 +97,9 @@ def get_mane_transcript_id(gene_name: str):
return mane_transcript_id, id_version


def get_exons_from_transcript_id(transcript_id: str, id_version: int, ref_version: str = "GRCh38"):
def get_exons_from_transcript_id(
transcript_id: str, id_version: int, ref_version: str = "GRCh38"
):
"""
Retrieves the exons and the start position of the coding sequence (CDS) for a given transcript ID and version.

Expand All @@ -114,7 +116,9 @@ def get_exons_from_transcript_id(transcript_id: str, id_version: int, ref_versio
if transcript_json["count"] != 1:
if transcript_json["count"] > 1:
api_url = f"http://tark.ensembl.org/api/transcript/?stable_id={transcript_id}&stable_id_version={id_version}&assembly_name={ref_version}&expand=exons"
response = requests.get(api_url, headers={"Content-Type": "application/json"})
response = requests.get(
api_url, headers={"Content-Type": "application/json"}
)
transcript_json = response.json()
if transcript_json["count"] != 1:
raise ValueError(
Expand Down Expand Up @@ -214,22 +218,12 @@ def get_cds_seq_pos_from_gene_name(gene_name: str, ref_version: str = "GRCh38"):
return cds_chrom, cds_seq, cds_pos, strand


def parse_args():
"""Get the input arguments"""
print(
r"""
_ _
/ \ '\ __ _ _ _
| \ \ / _(_) | |_ ___ _ _
\ \ | | _| | | _/ -_) '_|
`.__|/ |_| |_|_|\__\___|_|
"""
)
print("bean-filter: filter alleles")
parser = argparse.ArgumentParser(
prog="allele_filter",
description="Filter alleles based on edit position in spacer and frequency across samples.",
)
def parse_args(parser=None):
if parser is None:
parser = argparse.ArgumentParser(
prog="allele_filter",
description="Filter alleles based on edit position in spacer and frequency across samples.",
)
parser.add_argument(
"bdata_path",
type=str,
Expand Down Expand Up @@ -276,6 +270,12 @@ def parse_args():
help="Only consider edit within window provided by (edit-start-pos, edit-end-pos). If this flag is not provided, `--edit-start-pos` and `--edit-end-pos` flags are ignored.",
action="store_true",
)
parser.add_argument(
"--keep-indels",
"-i",
help="Include indels.",
action="store_true",
)
parser.add_argument(
"--filter-target-basechange",
"-b",
Expand Down Expand Up @@ -339,7 +339,7 @@ def parse_args():
action="store_true",
help="Load temporary file and work from there.",
)
return parser.parse_args()
return parser


def check_args(args):
Expand Down
Empty file added bean/cli/__init__.py
Empty file.
Loading
Loading