diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml index d8b6316..c648a97 100755 --- a/.github/workflows/documentation.yml +++ b/.github/workflows/documentation.yml @@ -18,7 +18,7 @@ jobs: sudo apt-get install python3-distutils - name: Build Documentation working-directory: docs - run: sphinx-build . _build + run: sphinx-build -W --keep-going . _build - name: copy image files run: cp -r docs/assets docs/_build/ - uses: actions/upload-pages-artifact@v3 diff --git a/README.md b/README.md index 54337f3..e41abc9 100755 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ [![PyPI version](https://img.shields.io/pypi/v/crispr-bean)](https://pypi.org/project/crispr-bean/) [![Code style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black) -`bean` unconfounds variant effect from variable editing outcome of CRISPR screens by considering genotypic outcome from *reporter* sequence. +`bean` improves CRISPR pooled screen analysis by 1) unconfounding variable per-guide editing outcome by considering genotypic outcome from *reporter* sequence and 2) through accurate modeling of screen procedure. diff --git a/docs/ReporterScreen_api.rst b/docs/ReporterScreen_api.rst index d489ffc..54604cf 100755 --- a/docs/ReporterScreen_api.rst +++ b/docs/ReporterScreen_api.rst @@ -1195,7 +1195,7 @@ LFC calculation & Addition sns.pairplot(lfcs) -.. image:: ../imgs/output_20_2.png +.. image:: assets/output_20_2.png LFC can be aggregated for biological replicates. @@ -1599,7 +1599,7 @@ Getting edit rates from allele counts plt.show() -.. image:: ../imgs/output_34_1.png +.. image:: assets/output_34_1.png diff --git a/docs/_count.md b/docs/_count.md index 7400cba..b65b98e 100755 --- a/docs/_count.md +++ b/docs/_count.md @@ -3,7 +3,7 @@ -```python +```bash bean count-samples \ --input sample_list.csv `# sample with lines 'R1_filepath,R2_filepath,sample_name\n'` \ -b A `# base that is being edited (A/G)` \ @@ -13,7 +13,7 @@ bean count-samples \ -t 12 `# number of threads` \ --name my_sorting_screen `# name of this sample run` \ ``` -```python +```bash bean count --R1 R1.fq --R2 R2.fq -b A -f sgRNA_info_table.csv -r ``` By default, `bean count[-samples]` assume R1 and R2 are trimmed off of the adapter sequence. You may need to adjust the command arguments according to your read structure. diff --git a/docs/_guild/.doctrees/ReporterScreen_api.doctree b/docs/_guild/.doctrees/ReporterScreen_api.doctree deleted file mode 100644 index 0ecb4df..0000000 Binary files a/docs/_guild/.doctrees/ReporterScreen_api.doctree and /dev/null differ diff --git a/docs/_index.md b/docs/_index.md deleted file mode 100755 index 45e2414..0000000 --- a/docs/_index.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -layout: default -title: CRISPR-BEAN ---- diff --git a/docs/_input.md b/docs/_input.md index 21368b8..0a4b6c6 100755 --- a/docs/_input.md +++ b/docs/_input.md @@ -4,7 +4,7 @@ File should contain following columns. * `name`: gRNA ID column * `sequence`: gRNA sequence * `barcode`: R2 barcode to help match reporter to gRNA, written in the sense direction (as in R1) -* In order to use accessibility in the [variant effect quantification](#bean-run-quantify-variant-effects), provide accessibility information in one of two options. (For non-targeting guides, provide NA values (empty cell).) +* In order to use accessibility in the variant effect quantification downstream (in [`bean run`](https://pinellolab.github.io/crispr-bean/run.html)), provide accessibility information in one of two options. (For non-targeting guides, provide NA values (empty cell).) * Option 1: `chrom` & `genomic_pos`: Chromosome (ex. `chr19`) and genomic position of guide sequence. You will have to provide the path to the bigwig file with matching reference version in `bean run`. * Option 2: `accessibility_signal`: ATAC-seq signal value of the target loci of each guide. * For variant library (gRNAs are designed to target specific variants and ignores bystander edits) @@ -16,7 +16,7 @@ File should contain following columns. * `chrom`: Chromosome of gRNA targeted locus. * `start_pos`: gRNA starting position in the genome. Required when you provide `strand` column. Should specify the smaller coordinate value among start and end position regardless of gRNA strandedness. -Also see examples for [variant library](tests/data/test_guide_info.csv) and [tiling library](tests/data/test_guide_info_tiling.csv). +Also see examples for [variant library](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info.csv) and [tiling library](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info_tiling_chrom.csv). ## sample_list.csv File should contain following columns with header. @@ -34,4 +34,4 @@ For proliferation / survival screens: * `time`: Numeric time following the base editing of each sample. -Also see examples for [FACS sorting screen](tests/data/sample_list.csv) and [proliferation / survival screen](tests/data/sample_list_survival.csv). \ No newline at end of file +Also see examples for [FACS sorting screen](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list.csv) and [proliferation / survival screen](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list_survival.csv). \ No newline at end of file diff --git a/docs/_ldl_cds.md b/docs/_ldl_cds.md index 90740ed..3913879 100755 --- a/docs/_ldl_cds.md +++ b/docs/_ldl_cds.md @@ -51,10 +51,11 @@ bean run sorting tiling \ --scale-by-acc \ --accessibility-col accessibility ``` + See more details below. ## 1. Count gRNA & reporter (:ref:`count_samples`) -``` +```bash screen_id=my_sorting_tiling_screen working_dir=my_workdir @@ -67,11 +68,13 @@ bean count-samples \ -n ${screen_id} `# ID of the screen` \ --tiling ``` -Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. + +Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. ## 2. QC (:ref:`qc`) -Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean qc`](../../README#bean qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) -``` +Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) + +```bash bean qc \ ${working_dir}/bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \ -o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \ @@ -87,13 +90,15 @@ If the data does not include reporter editing data, you can provide `--no-editin As tiling library doesn't have designated per-gRNA target variant, any base edit observed in reporter may be the candidate variant, while having too many variants with very low editing rate significantly decreases the power. Variants are filtered based on multiple criteria in `bean fitler`. If the screen targets coding sequence, it's beneficial to translate edits into coding varaints whenever possible for better power. For translation, provide `--translate` and one of the following: -``` + +```bash [ --translate-gene-name GENE_SYMBOL OR --translate-genes-list path_to_gene_names_file.txt OR --translate-fasta gene_exon.fa, OR --translate-fastas-csv gene_exon_fas.csv] ``` -where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbol uses its MANE transcript (hg38) coordinates of exons. In order to use other reference versions or transcript ID, you'll need to feed in fasta file. See detailed formatting of fasta file [here](../../README#translating-alleles). + +where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbol uses its MANE transcript (hg38) coordinates of exons. In order to use other reference versions or transcript ID, you'll need to feed in fasta file. See [detailed formatting of fasta file](https://pinellolab.github.io/crispr-bean/filter.html#translating-alleles). Example allele filtering given we're translating based on MANE transcript exons of multiple gene symbols: @@ -106,15 +111,16 @@ bean filter ${working_dir}/bean_count_${screen_id}_masked.h5ad \ --translate --translate-genes-list ${working_dir}/gene_symbols.txt ``` -Ouptut file `` shows number of alleles per guide and number of guides per variant, where we want high enough values for the latter. See the typical output for dataset with good editing coverage & filtering result [here](../example_filtering_ouptut/). +Ouptut file `` shows number of alleles per guide and number of guides per variant, where we want high enough values for the latter. See the [typical output](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_filtering_ouptut/) for dataset with good editing coverage & filtering result. ## 4. Quantify variant effect (:ref:`run`) -By default, `bean run [sorting,survival] tiling` uses most filtered allele counts table for variant identification and quantification of their effects. **Check [allele filtering output](../example_filtering_ouptut/)** and choose alternative filtered allele counts table if necessary. +By default, `bean run [sorting,survival] tiling` uses most filtered allele counts table for variant identification and quantification of their effects. Check [allele filtering output](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_filtering_ouptut/) and choose alternative filtered allele counts table if necessary. `bean run` can take 3 run options to quantify editing rate: 1. From **reporter + accessibility** - 1-1. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, - ``` + 1-1. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, + + ```bash bean run sorting tiling \ ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \ -o $working_dir \ @@ -122,8 +128,10 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count --scale-by-acc \ --accessibility-col accessibility ``` - 1-2. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, - ``` + + 1-2. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, + + ```bash bean run sorting tiling \ ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \ -o $working_dir \ @@ -133,15 +141,18 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count ``` 2. From **reporter** - ``` + + ```bash bean run sorting tiling \ ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \ -o $working_dir \ --fit-negctrl ``` + 3. No reporter information, assume the same editing efficiency of all gRNAs. - Use this option if your data don't have editing rate information. - ``` + Use this option if your data don't have editing rate information. + + ```bash bean run sorting tiling \ ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \ -o $working_dir \ diff --git a/docs/_ldl_var.md b/docs/_ldl_var.md index 7eb1af7..f81f714 100755 --- a/docs/_ldl_var.md +++ b/docs/_ldl_var.md @@ -43,6 +43,7 @@ bean-run sorting variant \ --scale-by-acc \ --accessibility-col accessibility ``` + See more details below. ## 1. Count gRNA & reporter (:ref:`count_samples`) @@ -58,11 +59,13 @@ bean-count-samples \ -r `# Quantify reporter edits` \ -n ${screen_id} `# ID of the screen to be counted` ``` -Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. + +Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. ## 2. QC samples & guides (:ref:`qc`) -Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) -``` +Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) + +```bash bean-qc \ bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \ -o bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \ @@ -78,8 +81,8 @@ If the data does not include reporter editing data, you can provide `--no-editin `bean-run` can take 3 run options to quantify editing rate: 1. From **reporter + accessibility** - If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, - ``` + If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, + ```bash bean-run sorting variant \ ${working_dir}/bean_count_${screen_id}_masked.h5ad \ -o ${working_dir}/ \ @@ -87,8 +90,10 @@ If the data does not include reporter editing data, you can provide `--no-editin --scale-by-acc \ --accessibility-col accessibility ``` - If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, - ``` + + If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, + + ```bash bean-run sorting variant \ ${working_dir}/bean_count_${screen_id}_masked.h5ad \ -o ${working_dir}/ \ @@ -99,16 +104,19 @@ If the data does not include reporter editing data, you can provide `--no-editin 2. From **reporter**, without accessibility - This assumes the all target sites have the uniform chromatin accessibility. - ``` + This assumes the all target sites have the uniform chromatin accessibility. + + ```bash bean-run sorting variant \ ${working_dir}/bean_count_${screen_id}_masked.h5ad \ -o ${working_dir}/ \ --fit-negctrl ``` + 3. No reporter information, assume the same editing efficiency of all gRNAs. - Use this option if your data don't have editing outcome information. - ``` + Use this option if your data don't have editing outcome information. + + ```bash bean-run sorting variant \ ${working_dir}/bean_count_${screen_id}_masked.h5ad \ -o ${working_dir}/ \ diff --git a/docs/_profile.md b/docs/_profile.md index cd40cb4..47806a2 100755 --- a/docs/_profile.md +++ b/docs/_profile.md @@ -1,8 +1,10 @@ # `bean profile`: Profile editing patterns + ```bash bean profile my_sorting_screen.h5ad -o output_prefix `# Prefix for editing profile report` ``` + # Output Above command produces `prefix_editing_preference.[html,ipynb]` as editing preferences ([see example](../notebooks/profile_editing_preference.ipynb)). - \ No newline at end of file + \ No newline at end of file diff --git a/docs/_prolif_gwas.md b/docs/_prolif_gwas.md index 9af45ee..cf1ded9 100644 --- a/docs/_prolif_gwas.md +++ b/docs/_prolif_gwas.md @@ -8,7 +8,7 @@ GWAS variant screen with per-variant gRNA tiling design, selected based on FACS