fix broken documentation links and improve sidebar

pinellolab · Apr 1, 2024 · 8753d31 · 8753d31
1 parent ebd5345
commit 8753d31
Show file tree

Hide file tree

Showing 28 changed files with 145 additions and 132 deletions.
diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml
@@ -18,7 +18,7 @@ jobs:
         sudo apt-get install python3-distutils
     - name: Build Documentation
       working-directory: docs
-      run: sphinx-build . _build
+      run: sphinx-build -W --keep-going . _build
     - name: copy image files
       run: cp -r docs/assets docs/_build/
     - uses: actions/upload-pages-artifact@v3

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 [![PyPI version](https://img.shields.io/pypi/v/crispr-bean)](https://pypi.org/project/crispr-bean/)
 [![Code style](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)
 
-`bean` unconfounds variant effect from variable editing outcome of CRISPR screens by considering genotypic outcome from *reporter* sequence. 
+`bean` improves CRISPR pooled screen analysis by 1) unconfounding variable per-guide editing outcome by considering genotypic outcome from *reporter* sequence and 2) through accurate modeling of screen procedure.
 
 
 <img src="docs/assets/summary.png" alt="Reporter construct" width="700"/>

diff --git a/docs/ReporterScreen_api.rst b/docs/ReporterScreen_api.rst
@@ -1195,7 +1195,7 @@ LFC calculation & Addition
     sns.pairplot(lfcs)
 
 
-.. image:: ../imgs/output_20_2.png
+.. image:: assets/output_20_2.png
 
 
 LFC can be aggregated for biological replicates.
@@ -1599,7 +1599,7 @@ Getting edit rates from allele counts
     plt.show()
 
 
-.. image:: ../imgs/output_34_1.png
+.. image:: assets/output_34_1.png
 
 
 

diff --git a/docs/_count.md b/docs/_count.md
@@ -3,7 +3,7 @@
 
 
 
-```python
+```bash
 bean count-samples \
   --input sample_list.csv   `# sample with lines 'R1_filepath,R2_filepath,sample_name\n'` \
   -b A                      `# base that is being edited (A/G)` \
@@ -13,7 +13,7 @@ bean count-samples \
   -t 12                     `# number of threads` \
   --name my_sorting_screen  `# name of this sample run` \
 ```
-```python
+```bash
 bean count --R1 R1.fq --R2 R2.fq -b A -f sgRNA_info_table.csv -r
 ```
 By default, `bean count[-samples]` assume R1 and R2 are trimmed off of the adapter sequence. You may need to adjust the command arguments according to your read structure. 

diff --git a/docs/_guild/.doctrees/ReporterScreen_api.doctree b/docs/_guild/.doctrees/ReporterScreen_api.doctree
diff --git a/docs/_index.md b/docs/_index.md
diff --git a/docs/_input.md b/docs/_input.md
@@ -4,7 +4,7 @@ File should contain following columns.
 * `name`: gRNA ID column
 * `sequence`: gRNA sequence
 * `barcode`: R2 barcode to help match reporter to gRNA, written in the sense direction (as in R1)
-* In order to use accessibility in the [variant effect quantification](#bean-run-quantify-variant-effects), provide accessibility information in one of two options. (For non-targeting guides, provide NA values (empty cell).)   
+* In order to use accessibility in the variant effect quantification downstream (in [`bean run`](https://pinellolab.github.io/crispr-bean/run.html)), provide accessibility information in one of two options. (For non-targeting guides, provide NA values (empty cell).)   
   * Option 1: `chrom` & `genomic_pos`: Chromosome (ex. `chr19`) and genomic position of guide sequence. You will have to provide the path to the bigwig file with matching reference version in `bean run`. 
   * Option 2: `accessibility_signal`: ATAC-seq signal value of the target loci of each guide.  
 * For variant library (gRNAs are designed to target specific variants and ignores bystander edits)
@@ -16,7 +16,7 @@ File should contain following columns.
   * `chrom`: Chromosome of gRNA targeted locus.
   * `start_pos`: gRNA starting position in the genome. Required when you provide `strand` column. Should specify the smaller coordinate value among start and end position regardless of gRNA strandedness.
 
-Also see examples for [variant library](tests/data/test_guide_info.csv) and [tiling library](tests/data/test_guide_info_tiling.csv).
+Also see examples for [variant library](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info.csv) and [tiling library](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info_tiling_chrom.csv).
 
 ## sample_list.csv
 File should contain following columns with header.
@@ -34,4 +34,4 @@ For proliferation / survival screens:
 * `time`: Numeric time following the base editing of each sample.
 
 
-Also see examples for [FACS sorting screen](tests/data/sample_list.csv) and [proliferation / survival screen](tests/data/sample_list_survival.csv).
+Also see examples for [FACS sorting screen](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list.csv) and [proliferation / survival screen](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list_survival.csv).
diff --git a/docs/_ldl_cds.md b/docs/_ldl_cds.md
@@ -51,10 +51,11 @@ bean run sorting tiling \
     --scale-by-acc \
     --accessibility-col accessibility
 ```
+
 See more details below.
 
 ## 1. Count gRNA & reporter (:ref:`count_samples`)
-```
+```bash
 screen_id=my_sorting_tiling_screen
 working_dir=my_workdir
 
@@ -67,11 +68,13 @@ bean count-samples \
 -n ${screen_id}                                       `# ID of the screen` \
 --tiling
 ```
-Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
+
+Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
 
 ## 2. QC (:ref:`qc`)
-Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean qc`](../../README#bean qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
-```
+Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
+
+```bash
 bean qc \
   ${working_dir}/bean_count_${screen_id}.h5ad           `# Input ReporterScreen .h5ad file path` \
   -o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
@@ -87,13 +90,15 @@ If the data does not include reporter editing data, you can provide `--no-editin
 As tiling library doesn't have designated per-gRNA target variant, any base edit observed in reporter may be the candidate variant, while having too many variants with very low editing rate significantly decreases the power. Variants are filtered based on multiple criteria in `bean fitler`.  
 
 If the screen targets coding sequence, it's beneficial to translate edits into coding varaints whenever possible for better power. For translation, provide `--translate` and one of the following:
-```
+
+```bash
 [ --translate-gene-name GENE_SYMBOL OR
   --translate-genes-list path_to_gene_names_file.txt OR
   --translate-fasta gene_exon.fa, OR
   --translate-fastas-csv gene_exon_fas.csv]
 ```
-where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbol uses its MANE transcript (hg38) coordinates of exons. In order to use other reference versions or transcript ID, you'll need to feed in fasta file. See detailed formatting of fasta file [here](../../README#translating-alleles).
+
+where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbol uses its MANE transcript (hg38) coordinates of exons. In order to use other reference versions or transcript ID, you'll need to feed in fasta file. See [detailed formatting of fasta file](https://pinellolab.github.io/crispr-bean/filter.html#translating-alleles).
 
 Example allele filtering given we're translating based on MANE transcript exons of multiple gene symbols:
 
@@ -106,24 +111,27 @@ bean filter ${working_dir}/bean_count_${screen_id}_masked.h5ad \
 --translate --translate-genes-list ${working_dir}/gene_symbols.txt
 ```
 
-Ouptut file `` shows number of alleles per guide and number of guides per variant, where we want high enough values for the latter. See the typical output for dataset with good editing coverage & filtering result [here](../example_filtering_ouptut/).
+Ouptut file `` shows number of alleles per guide and number of guides per variant, where we want high enough values for the latter. See the [typical output](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_filtering_ouptut/) for dataset with good editing coverage & filtering result.
 
 ## 4. Quantify variant effect (:ref:`run`)
-By default, `bean run [sorting,survival] tiling` uses most filtered allele counts table for variant identification and quantification of their effects. **Check [allele filtering output](../example_filtering_ouptut/)** and choose alternative filtered allele counts table if necessary.   
+By default, `bean run [sorting,survival] tiling` uses most filtered allele counts table for variant identification and quantification of their effects. Check [allele filtering output](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_filtering_ouptut/) and choose alternative filtered allele counts table if necessary.   
 
 `bean run` can take 3 run options to quantify editing rate:  
 1. From **reporter + accessibility**  
-    1-1. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
-    ```
+  1-1. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
+
+    ```bash
     bean run sorting tiling \
     ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
     -o $working_dir \
     --fit-negctrl \
     --scale-by-acc \
     --accessibility-col accessibility
     ```
-    1-2. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
-    ```
+
+  1-2. If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
+
+    ```bash
     bean run sorting tiling \
     ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
     -o $working_dir \
@@ -133,15 +141,18 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count
     ```
 
 2. From **reporter**
-    ```
+
+    ```bash
     bean run sorting tiling \
     ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
     -o $working_dir \
     --fit-negctrl 
     ```
+
 3. No reporter information, assume the same editing efficiency of all gRNAs.  
-    Use this option if your data don't have editing rate information.
-    ```
+  Use this option if your data don't have editing rate information.
+
+    ```bash
     bean run sorting tiling \
     ${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
     -o $working_dir \

diff --git a/docs/_ldl_var.md b/docs/_ldl_var.md
@@ -43,6 +43,7 @@ bean-run sorting variant \
     --scale-by-acc \
     --accessibility-col accessibility
 ```
+
 See more details below.
 
 ## 1. Count gRNA & reporter (:ref:`count_samples`)
@@ -58,11 +59,13 @@ bean-count-samples \
 -r                                    `# Quantify reporter edits` \
 -n ${screen_id}                          `# ID of the screen to be counted`   
 ```
-Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
+
+Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
 
 ## 2. QC samples & guides (:ref:`qc`)
-Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
-```
+Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
+
+```bash
 bean-qc \
   bean_count_${screen_id}.h5ad    `# Input ReporterScreen .h5ad file path` \
   -o bean_count_${screen_id}_masked.h5ad   `# Output ReporterScreen .h5ad file path` \
@@ -78,17 +81,19 @@ If the data does not include reporter editing data, you can provide `--no-editin
 
 `bean-run` can take 3 run options to quantify editing rate:  
 1. From **reporter + accessibility**  
-    If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
-    ```
+  If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
+    ```bash
     bean-run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o ${working_dir}/ \
     --fit-negctrl \
     --scale-by-acc \
     --accessibility-col accessibility
     ```
-    If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
-    ```
+
+  If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
+
+    ```bash
     bean-run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o ${working_dir}/ \
@@ -99,16 +104,19 @@ If the data does not include reporter editing data, you can provide `--no-editin
 
 2. From **reporter**, without accessibility
 
-    This assumes the all target sites have the uniform chromatin accessibility.
-    ```
+  This assumes the all target sites have the uniform chromatin accessibility.
+
+    ```bash
     bean-run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o ${working_dir}/ \
     --fit-negctrl 
     ```
+
 3. No reporter information, assume the same editing efficiency of all gRNAs.  
-    Use this option if your data don't have editing outcome information.
-    ```
+  Use this option if your data don't have editing outcome information.
+
+    ```bash
     bean-run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o ${working_dir}/ \

diff --git a/docs/_profile.md b/docs/_profile.md
@@ -1,8 +1,10 @@
 # `bean profile`: Profile editing patterns
+
 ```bash
 bean profile my_sorting_screen.h5ad -o output_prefix `# Prefix for editing profile report` 
 ```
+
 # Output
 Above command produces `prefix_editing_preference.[html,ipynb]` as editing preferences ([see example](../notebooks/profile_editing_preference.ipynb)).  
 
-<img src="/crispr-bean/assets/profile_output.png" alt="Allele translation" width="700" />  
+<img src="/crispr-bean/assets/profile_output.png" alt="Editing profiles" width="700" />  
diff --git a/docs/_prolif_gwas.md b/docs/_prolif_gwas.md
@@ -8,7 +8,7 @@ GWAS variant screen with per-variant gRNA tiling design, selected based on FACS
   </tr>
   <tr>
     <th>Selection</th>
-    <td>Cells are sorted based on FACS signal quantiles  <br>  <img src="/crispr-bean/assets/proliferation.png" alt="variant library design" width="300"/></td>
+    <td>Cells are grown and will be selected based on their fitness. Cells are sampled in multiple timepoints.  <br>  <img src="/crispr-bean/assets/proliferation.png" alt="variant library design" width="300"/></td>
   </tr>
 </table>
 
@@ -56,11 +56,11 @@ bean count-samples \
 -r                                    `# Quantify reporter edits` \
 -n ${screen_id}                          `# ID of the screen to be counted`  
 ```
-Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
+Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. 
 
 ## 2. QC samples & guides (:ref:`qc`)
-Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean qc`](../../README#bean qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
-```
+Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)
+```bash
 bean qc \
   ${working_dir}/bean_count_${screen_id}.h5ad             `# Input ReporterScreen .h5ad file path` \
   -o ${working_dir}/bean_count_${screen_id}_masked.h5ad   `# Output ReporterScreen .h5ad file path` \
@@ -79,17 +79,17 @@ If the data does not include reporter editing data, you can provide `--no-editin
 
 `bean run` can take 3 run options to quantify editing rate:  
 1. From **reporter + accessibility**  
-    If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
-    ```
+  If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score, 
+    ```bash
     bean run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o $working_dir \
     --fit-negctrl \
     --scale-by-acc \
     --accessibility-col accessibility
     ```
-    If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
-    ```
+  If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, 
+    ```bash
     bean run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o $working_dir \
@@ -100,16 +100,17 @@ If the data does not include reporter editing data, you can provide `--no-editin
 
 2. From **reporter**, without accessibility
 
-    This assumes the all target sites have the uniform chromatin accessibility.
-    ```
+  This assumes the all target sites have the uniform chromatin accessibility.
+    ```bash
     bean run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o $working_dir \
     --fit-negctrl 
     ```
 3. No reporter information, assume the same editing efficiency of all gRNAs.  
-    Use this option if your data don't have editing outcome information.
-    ```
+  Use this option if your data don't have editing outcome information.
+  
+    ```bash
     bean run sorting variant \
     ${working_dir}/bean_count_${screen_id}_masked.h5ad \
     -o $working_dir \