Merge pull request #43 from pinellolab/dev

add docs for sgRNA output
pinellolab · Jun 25, 2024 · b0a2d1b · b0a2d1b
2 parents b864328 + 5214932
commit b0a2d1b
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@
 2. [`profile`](https://pinellolab.github.io/crispr-bean/profile.html): Profile editing preferences of your editor.  
 3. [`qc`](https://pinellolab.github.io/crispr-bean/qc.html): Quality control report and filtering out / masking of aberrant sample and guides  
 4. [`filter`](https://pinellolab.github.io/crispr-bean/filter.html): Filter reporter alleles; essential for `tiling` mode that allows for all alleles generated from gRNA.
-5. [`run`](https://pinellolab.github.io/crispr-bean/run.html): Quantify targeted variants' effect sizes from screen data.  **See more about the model in the link**.
+5. [`run`](https://pinellolab.github.io/crispr-bean/run.html): Quantify targeted variants' effect sizes from screen data. **See more about the [model](https://pinellolab.github.io/crispr-bean/model.html) & [output](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_run_output)**
 * Screen data is saved as [`ReporterScreen` object](https://pinellolab.github.io/crispr-bean/reporterscreen.html) in the pipeline.
 BEAN stores mapped gRNA and allele counts in `ReporterScreen` object which is compatible with [AnnData](https://anndata.readthedocs.io/en/latest/index.html). 
 

diff --git a/docs/_run.md b/docs/_run.md
@@ -48,19 +48,42 @@ See full list of parameters [below](#full-parameters).
 <img src="/crispr-bean/assets/model_output.png" alt="model" width="700"/>
 
 Above command produces
-* `output_prefix/bean_element_result.[model_type].csv` with following columns:
-  * Estimated variant effect sizes
-    * `mu` (Effect size): Mean of variant phenotype, given the wild type has standard normal phenotype distribution of `mu = 0, sd = 1`.
-    * `mu_sd`: Mean of variant phenotype `mu` is modeled as normal distribution. The column shows fitted standard deviation of `mu` that quantify the uncertainty of the variant effect.
-    * `mu_z`: z-score of `mu`
-    * `sd`: Standard deviation of variant phenotype, given the wild type has standard normal phenotype distribution of `mu = 0, sd = 1`.
-    * `CI[0.025`, `0.975]`: Credible interval of `mu`
-    * **When negative control is provided, above columns with `_adj` suffix are provided, which are the corresponding values adjusted for negative control.**  
-  * Metrics on per-variant evidence provided in input (provided in `tiling` mode)
-    * `effective_edit_rate`: Sum of per-variant editing rates over all alleles observed in the input. Allele-level editing rate is divided by the number of variants observed in the allele prior to summing up.
-    * `n_guides`: # of guides covering the variant.
-    * `n_coocc`: # of cooccurring variants with a given variant in any alleles observed in the input.
-* `output_prefix/bean_sgRNA_result.[model_type].csv`: 
-  * `edit_rate`: Estimated editing rate at the target loci.
+## `bean_element_result.[model_type].csv`
+- Variant ID / grouping
+  - `edit`: Variant ID.
+  - `group`: The grouping of the coding variants, assigned as one of nonsense/missense/synonymous.
+  - `int_pos`: The integer position of the noncoding variants.
+  - `chrom`: The chromosome of the variant.
+  - `pos`: The position of the variant. If coding variant, starts with `A` and the position specified 1-based amino acid position. If noncodig variant, numeric genomic position.
+  - `ref`: The reference base/amino acid of the variant.
+  - `alt`: The alternative base/amino acid of the variant.
+  - `coding`: A flag indicating if the element is coding variant or not.
+
+- Per-variant summary of variant-producing guides (`tiling` mode)
+  - `guide_target_group`: Aggregated `target_group` column in the input sgRNA_info.csv file. All unique values of the guides that produced (filtered) edited alleles that includes this variant is listed.
+  - `effective_edit_rate`: The effective editing rate of the element. Calculated as `sum_over_guides(sum_over_alleles(per_guide_allele_editing_rate / # variants in the allele))`.
+  - `editing_guides`: List of guides that edited the variant.
+  - `per_guide_editing_rates`: The per-guide editing rates of the variant.
+  - `n_guides`: The number of guides that edited the variant.
+  - `n_coocc`: The number of unique co-occurring variants that appeared together in any alleles that contains the variant.
+  
+- Variant effect size: Use `mu_z_adj` whenever available, otherwise `mu_z_scaled`, otherwise `mu_z`.
+  - `mu`: The mean value of the variant effect size.
+  - `mu_sd`: The standard deviation of the mean value of the variant effect size.
+  - `mu_z`: The z-score of the mean value of the variant effect size.
+  - `sd`: The standard deviation of the phenotype induced by the variant.
+  - `CI[0.025,0.975]`: The 95% credible interval of the mean value of the variant effect size. Corresponds to `mu_z_adj` when available, otherwise `mu_z_scaled`, otherwise `mu_z`. 
+  - `[]_scaled`: Above values scaled by negative control variants.
+  - `[]_adj`: Above values scaled by synonymous variants.
+  
+## `bean_sgRNA_result.[model_type].csv`
+- `name`: sgRNA ID provided in the `name` column of the input.
+- `edit_rate`: Effective editing rates
+- `accessibility`: (Only if you have used `--scale-by-acc`) Accessibility signal that is used for scaling of the editing rate.
+- `scaled_edit_rate`: (Only if you have used `--scale-by-acc`) Endogenous editing rate used for modeling, estimated by scaling reporter editing rate by accessibility signal
+- `[cond1]_[cond2].median_lfc`: Raw LFC with pseudocount fed in with `--guide-lfc-pseudocount` argument (default 5).
+- For `tiling` mode
+  - `variants`: Variants generated by this gRNA
+  - `variant_edit_rates`: Editing rate of this gRNA for each variant it creates.
 
 See the full output file description and example output [here](https://github.com/pinellolab/crispr-bean/tree/main/docs/example_run_output).
diff --git a/docs/example_run_output/tiling/README.md b/docs/example_run_output/tiling/README.md
@@ -20,6 +20,7 @@ These are the example output of [`bean run`](https://pinellolab.github.io/crispr
   - `per_guide_editing_rates`: The per-guide editing rates of the variant.
   - `n_guides`: The number of guides that edited the variant.
   - `n_coocc`: The number of unique co-occurring variants that appeared together in any alleles that contains the variant.
+
 - Variant effect size: Use `mu_z_adj` whenever available, otherwise `mu_z_scaled`, otherwise `mu_z`.
   - `mu`: The mean value of the variant effect size.
   - `mu_sd`: The standard deviation of the mean value of the variant effect size.