diff --git a/README.md b/README.md index ef03512..eead4ed 100644 --- a/README.md +++ b/README.md @@ -25,8 +25,17 @@ This is an analysis toolkit for the pooled CRISPR reporter or sensor data. The r ### Screen data is saved as *ReporterScreen* object in the pipeline. BEAN stores mapped gRNA and allele counts in `ReporterScreen` object which is compatible with [AnnData](https://anndata.readthedocs.io/en/latest/index.html). See [Data Structure](#data-structure) section for more information. -### Examples -We provide example scripts in `tests/`. Running `pytest --sparse-ordering` generates example input/output files from running 1 and 2-4 sequentially. +

+ +## Examples +| Tutorial link | [Library design](#pipeline-run-options-by-library-design) | Selection | Reporter | +|---------------| -------------- | --------- | -------- | +| [LDL-C GWAS](docs/tutorials/ldl_var.md) | GWAS variant library | FACS sorting | Yes/No | +| [LDL-C LDLR CDS](docs/tutorials/ldl_cds.md) | Coding sequence tiling libarary | FACS sorting | Yes/No | +| TKO simulated (Coming soon!) | GWAS variant library | Survival / Proliferation | Yes/No | +| TKO simulated, tiling (Coming soon!) | Coding sequence tiling libarary | Survival / Proliferation | Yes/No | + +We also provide example scripts in `tests/`. Running `pytest --sparse-ordering` will generate example input/output files. ### Pipeline run options by library design The `bean-filter` and `bean-run` steps depend on the type of gRNA library design, where BEAN supports two modes of running. @@ -83,8 +92,8 @@ File should contain following columns. * Option 2: `accessibility_signal`: ATAC-seq signal value of the target loci of each guide. * For variant library (gRNAs are designed to target specific variants and ignores bystander edits) * `target`: This column denotes which target variant/element of each gRNA. This is not used in `bean-count[-samples]` but required to run `bean-run` in later steps. - * `target_group [Optional]`: If negative/positive control gRNA will be considered in `bean-qc` and/or `bean-run`, specify as "NegCtrl"/"PosCtrl" in this column. - * `target_pos [Optional]`: If `--match_target_pos` flag is used, input file needs `target_pos` which specifies 0-based relative position of targeted base within Reporter sequence. + * `target_group`: If negative/positive control gRNA will be considered in `bean-qc` and/or `bean-run`, specify as "NegCtrl"/"PosCtrl" in this column. + * `target_pos`: If `--match_target_pos` flag is used, input file needs `target_pos` which specifies 0-based relative position of targeted base within Reporter sequence. * For tiling library (gRNAs tile coding / noncoding sequences) * `strand`: Specifies gRNA strand information relative to the reference genome. * `chrom`: Chromosome of gRNA targeted locus. @@ -97,12 +106,18 @@ File should contain following columns with header. * `R1_filepath`: Path to read 1 `.fastq[.gz]` file * `R2_filepath`: Path to read 1 `.fastq[.gz]` file * `sample_id`: ID of sequencing sample -* `rep [Optional]`: Replicate # of this sample (Should NOT contain `.`) -* `bin [Optional]`: Name of the sorting bin -* `upper_quantile [Optional]`: FACS sorting upper quantile -* `lower_quantile [Optional]`: FACS sorting lower quantile +* `rep`: Replicate # of this sample (Should NOT contain `.`) +* `condition`: Name of the sorting bin (ex. `top`, `bot`), or label of timepoint (ex. `D5`, `D18`) + +For FACS sorting screens: +* `upper_quantile`: FACS sorting upper quantile +* `lower_quantile`: FACS sorting lower quantile -Optional columns are not required but can be provided for compatibility with `bean-qc` and `bean-run`. See [example](tests/data/sample_list.csv). +For proliferation / survival screens: +* `time`: FACS sorting upper quantile + + +Also see examples for [FACS sorting screen](tests/data/sample_list.csv). ### Output file format `count` or `count-samples` produces `.h5ad` and `.xlsx` file with guide and per-guide allele counts. @@ -111,35 +126,45 @@ Optional columns are not required but can be provided for compatibility with `be ### Parameters * `-b`, `--edited-base` (`str`, default: `None`): For base editors, the base that should be ignored when matching the gRNA sequence +* `-i`, `--input`: List of fastq and sample ids. See above for the format. * `-f`, `--sgRNA-filename` (`str`, default: `None`): sgRNA description file. (See above) -* `--guide-start-seq`: Guide starts after this sequence in R1 (default: '') -* `--guide-end-seq`: Guide ends after this sequence in R1 (default: '') -* `-r`, `--count-reporter` (default: `False`): Count edited alleles in reporters. -* `-q`, `--min-average-read-quality` (default: `30`): Minimum average quality score (phred33) to keep a read -* `-s`, `--min-single-bp-quality` (default: `0`): Minimum single bp score (phred33) to keep a read (default: 0) + +Output formats * `-n`, `--name`: Name of the output file will be `bean_count_{name}.h5ad`. * `-o`, `--output-folder`: Output folder +* `--tiling`: Specify that the guide library is tiling library +* `-r`, `--count-reporter` (default: `False`): Count edited alleles in reporters. +* `--string-allele` (default: `False`): Store allele as quality filtered string instead of Allele object +* `-g`, `--count-guide-edits` (default: `False`): count the self editing of guides (default: False) +* `-m`, `--count-guide-reporter-alleles`: count the matched allele of guide and reporter edit +* `--match-target-pos` (default: `False`): Count the edit in the exact target position. +* `--target-pos-col` (default: `target_pos`): Column name specifying the relative target position within *reporter* sequence. +* `--offset` (default: `False`): Guide file has `offest` column that will be added to the relative position of reporters. +* `--align-fasta` (default: ''): gRNA is aligned to this sequence to infer the offset. Can be used when the exact offset is not provided. + + +Read structure +* `--guide-start-seq`: Guide starts after this sequence in R1 (default: '') +* `--guide-end-seq`: Guide ends after this sequence in R1 (default: '') +* `--guide-start-seqs-file` (default: `None`): CSV file path with per-sample `guide_start_seq` to be used, if provided. Formatted as `sample_id, guide_start_seq` +* `--guide-end-seqs-file` (default: `None`): CSV file path with per-sample `guide_end_seq` to be used, if provided. Formatted as `sample_id,guide_end_seq` * `-l`, `--reporter-length` (default: `32`): Length of the reporter sequence. +* `--gstart-reporter` (default: `6`): Start position of the guide sequence in the reporter +* `--guide-bc` (default: `True`): Construct has guide barcode +* `--guide-bc-len` (default: `4`): Guide barcode sequence length at the beginning of the R2 + +Mapping quality filters +* `-q`, `--min-average-read-quality` (default: `30`): Minimum average quality score +(phred33) to keep a read * `--keep-intermediate` (default: `False`): Keep all intermediate files generated from filtering. +* `-s`, `--min-single-bp-quality` (default: `0`): Minimum single bp score (phred33) to keep a read (default: 0) * `--qstart-R1` (default: `0`): Start position of the read when filtering for quality score of the read 1 * `--qend-R1` (default: `47`): End position of the read when filtering for quality score of the read 1 * `--qstart-R2` (default: `0`): Start position of the read when filtering for quality score of the read 2 * `--qend-R2` (default: `36`): End position of the read when filtering for quality score of the read 2 -* `--gstart-reporter` (default: `6`): Start position of the guide sequence in the reporter -* `--match-target-pos` (default: `False`): Count the edit in the exact target position. -* `--target-pos-col` (default: `target_pos`): Column name specifying the relative target position within *reporter* sequence. -* `--guide-bc` (default: `True`): Construct has guide barcode -* `--guide-bc-len` (default: `4`): Guide barcode sequence length at the beginning of the R2 -* `--offset` (default: `False`): Guide file has `offest` column that will be added to the relative position of reporters. -* `--align-fasta` (default: ''): gRNA is aligned to this sequence to infer the offset. Can be used when the exact offset is not provided. -* `--string-allele` (default: `False`): Store allele as quality filtered string instead of Allele object -* `-g`, `--count-guide-edits` (default: `False`): count the self editing of guides (default: False) -* `-m`, `--count-guide-reporter-alleles`: count the matched allele of guide and reporter edit -* `--tiling`: Specify that the guide library is tiling library -* `-i`, `--input`: List of fastq and sample ids. See above for the format. + +Run options * `-t`, `--threads` (default: `10`): Number of threads to use -* `--guide-start-seqs-file` (default: `None`): CSV file path with per-sample `guide_start_seq` to be used, if provided. Formatted as `sample_id, guide_start_seq` -* `--guide-end-seqs-file` (default: `None`): CSV file path with per-sample `guide_end_seq` to be used, if provided. Formatted as `sample_id,guide_end_seq` * `--rerun` (default: `False`): Recount each sample. If `False`, existing count for each sample is taken. @@ -206,20 +231,33 @@ Above command produces #### Additional Parameters * `--tiling` (default: `None`): If set as `True` or `False`, it sets the screen object to be tiling (`True`) or variant (`False`)-targeting screen when calculating editing rate. * `--replicate-label` (default: `"rep"`): Label of column in `bdata.samples` that describes replicate ID. -* `--condition-label` (default: `"bin"`)": Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time, etc.). +* `--condition-label` (default: `"condition"`)": Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time, etc.). * `--sample-covariates` (default: `None`): Comma-separated list of column names in `bdata.samples` that describes non-selective experimental condition (drug treatment, etc.). The values in the `bdata.samples` should NOT contain `.`. * `--no-editing` (default: `False`): Ignore QC about editing. Can be used for QC of other editing modalities. -* `--target-pos-col` (default: `"target_pos"`): Target position column in `bdata.guides` specifying target edit position in reporter. -* `--rel-pos-is-reporter` (default: `False`): Specifies whether `edit_start_pos` and `edit_end_pos` are relative to reporter position. If `False`, those are relative to spacer position. -* `--edit-start-pos` (default: `2`): Edit start position to quantify editing rate on, 0-based inclusive. -* `--edit-end-pos` (default: `7`): Edit end position to quantify editing rate on, 0-based exclusive. -* `--count-correlation-thres` (default: `0.8`): Threshold of guide count correlation to mask out. -* `--edit-rate-thres` (default: `0.1`): Median editing rate threshold per sample to mask out. + +Editing rate quantification +* `--ctrl-cond` (default: `"bulk"`): Value in of column in `ReporterScreen.samples[condition_label]` where guide-level editing rate to be calculated + + Editing rate is calculated with following parameters in variant screens: + * `--target-pos-col` (default: `"target_pos"`): Target position column in `bdata.guides` specifying target edit position in reporter. + + For tiling screens: + * `--rel-pos-is-reporter` (default: `False`): Specifies whether `edit_start_pos` and `edit_end_pos` are relative to reporter position. If `False`, those are relative to spacer position. + * `--edit-start-pos` (default: `2`): Edit start position to quantify editing rate on, 0-based inclusive. + * `--edit-end-pos` (default: `7`): Edit end position to quantify editing rate on, 0-based exclusive. + +LFC of positive controls * `--posctrl-col` (default: `group`): Column name in .h5ad.guides DataFrame that specifies guide category. * `--posctrl-val` (default: `PosCtrl`): Value in .h5ad.guides[`posctrl_col`] that specifies guide will be used as the positive control in calculating log fold change. -* `--lfc-thres` (default: `0.1`): Positive guides' correlation threshold to filter out. * `--lfc-conds` (default: `"top,bot"`): Values in of column in `ReporterScreen.samples[condition_label]` for LFC will be calculated between, delimited by comma -* `--ctrl-cond` (default: `"bulk"`): Value in of column in `ReporterScreen.samples[condition_label]` where guide-level editing rate to be calculated + + +Sample filtering thresholds +* `--count-correlation-thres` (default: `0.7`): Threshold of guide count correlation to mask out. +* `--edit-rate-thres` (default: `0.1`): Mean editing rate threshold per sample to mask out. +* `--lfc-thres` (default: `0.1`): Positive guides' correlation threshold to filter out. + +Other * `--recalculate-edits` (default: `False`): Even when `ReporterScreen.layers['edit_count']` exists, recalculate the edit counts from `ReporterScreen.uns['allele_count']`."

@@ -237,7 +275,7 @@ Above command produces * `my_sorting_screen_filtered.h5ad` with filtered alleles stored in `.uns`, * `my_sorting_screen_filtered.filtered_allele_stats.pdf`, and `my_sorting_screen_filtered.filter_log.txt` that report allele count stats in each filtering step. -You may want to adjust the flitering parameters to obtain optimal balance between # guides per variant & # variants that are scored. See example outputs of filtering step [here](docs/example_filtering_outputs/). +You may want to adjust the flitering parameters to obtain optimal balance between # guides per variant & # variants that are scored. See example outputs of filtering step [here](docs/example_filtering_output/). ### Translating alleles @@ -367,7 +405,7 @@ Above command produces * `--target-column` (default: `target`): Column key in `bdata.guides` that describes the target element of each guide. * `--guide-activity-col`: Column in `bdata.guides` DataFrame showing the editing rate estimated via external tools. * Sample annotations (`bdata.samples` column keys) - * `--condition-column` (default: `bin`): Column key in `bdata.samples` that describes experimental condition. + * `--condition-column` (default: `condition`): Column key in `bdata.samples` that describes experimental condition. * `-uq`, `--sorting-bin-upper-quantile-column` (default: `upper_quantile`): Column name with upper quantile values of each sorting bin in bdata.samples * `-lq`, `--sorting-bin-lower-quantile-column` (default: `lower_quantile`): Column name with lower quantile values of each sorting bin in bdata.samples diff --git a/bean/annotate/translate_allele.py b/bean/annotate/translate_allele.py index 2e81a89..a8fe178 100644 --- a/bean/annotate/translate_allele.py +++ b/bean/annotate/translate_allele.py @@ -595,7 +595,8 @@ def strsplit_edit(edit_str): def annotate_edit( edit_info: pd.DataFrame, - edit_col="edit", + edit_col: str = "edit", + control_tag: str = "CONTROL", splice_sites: Collection[ int ] = None, # TODO: may be needed to extended into multi-chromosome case @@ -604,6 +605,7 @@ def annotate_edit( Args edit_info: pd.DataFrame with at least 1 column of 'edit_col', which has 'Edit' format. + control_tag: String tag identifying non-targeting control guides so their variant signal are not aggregated. splice_sites: Collection of integer splice site positions. If the edit position matches the positions, it will be annotated as 'splicing'. """ @@ -619,8 +621,13 @@ def annotate_edit( edit_info.loc[ edit_info.pos.map(lambda s: not s.startswith("A")), "coding" ] = "noncoding" - edit_info.loc[edit_info.pos.map(lambda s: "CONTROL" in s), "group"] = "negctrl" - edit_info.loc[edit_info.pos.map(lambda s: "CONTROL" in s), "coding"] = "negctrl" + if control_tag is not None: + edit_info.loc[ + edit_info.pos.map(lambda s: control_tag in s), "group" + ] = "negctrl" + edit_info.loc[ + edit_info.pos.map(lambda s: control_tag in s), "coding" + ] = "negctrl" edit_info.loc[ (edit_info.coding == "noncoding") & (edit_info.group != "negctrl"), "int_pos" ] = edit_info.loc[ diff --git a/bean/model/model.py b/bean/model/model.py index 96514ce..779cfc1 100644 --- a/bean/model/model.py +++ b/bean/model/model.py @@ -206,18 +206,23 @@ def ControlNormalModel(data, mask_thres=10, use_bcmatch=True): data.sample_mask, data.a0_bcmatch, ) - with poutine.mask( - mask=torch.logical_and( - data.X_bcmatch_masked.permute(0, 2, 1).sum(axis=-1) - > mask_thres, - data.repguide_mask, - ) - ): - pyro.sample( - "guide_bcmatch_counts", - dist.DirichletMultinomial(a_bcmatch, validate_args=False), - obs=data.X_bcmatch_masked.permute(0, 2, 1), - ) + try: + with poutine.mask( + mask=torch.logical_and( + data.X_bcmatch_masked.permute(0, 2, 1).sum(axis=-1) + > mask_thres, + data.repguide_mask, + ) + ): + pyro.sample( + "guide_bcmatch_counts", + dist.DirichletMultinomial(a_bcmatch, validate_args=False), + obs=data.X_bcmatch_masked.permute(0, 2, 1), + ) + except RuntimeError: + print(data.X_bcmatch_masked.shape) + print(data.repguide_mask.shape) + exit(1) return alleles_p_bin diff --git a/bean/model/run.py b/bean/model/run.py index 0cd7808..e09dcd5 100644 --- a/bean/model/run.py +++ b/bean/model/run.py @@ -102,7 +102,7 @@ def parse_args(): ) parser.add_argument( "--condition-col", - default="bin", + default="condition", type=str, help="Column key in `bdata.samples` that describes experimental condition.", ) @@ -281,8 +281,21 @@ def check_args(args, bdata): pass elif args.library_design == "tiling": if args.allele_df_key is None: + key_to_use = "allele_counts" + n_alleles = len(bdata.uns[key_to_use]) + for key, df in bdata.uns.items(): + if "allele_counts" not in key or not isinstance(df, pd.DataFrame): + continue + if len(df) < n_alleles: + key_to_use = key + n_alleles = len(df) + warn( + f"--allele-df-key not provided for tiling screen. Using the most filtered allele counts with {n_alleles} alleles stored in {key_to_use}." + ) + args.allele_df_key = key_to_use + elif args.allele_df_key not in bdata.uns: raise ValueError( - "--allele-df-key not provided for tiling screen. Feed in the key then allele counts in screen.uns[allele_df_key] will be used." + f"--allele-df-key {args.allele_df_key} not in ReporterScreen.uns. Check your input." ) else: raise ValueError( @@ -293,7 +306,7 @@ def check_args(args, bdata): bdata.guides[args.negctrl_col].map(lambda s: s.lower()) == args.negctrl_col_value.lower() ).sum() - if not n_negctrl >= 20: + if not n_negctrl >= 10: raise ValueError( f"Not enough negative control guide in the input data: {n_negctrl}. Check your input arguments." ) diff --git a/bean/preprocessing/data_class.py b/bean/preprocessing/data_class.py index 961fc92..4bfc862 100644 --- a/bean/preprocessing/data_class.py +++ b/bean/preprocessing/data_class.py @@ -200,7 +200,8 @@ def __getitem__(self, guide_idx): ndata.X_masked = ndata.X_masked[:, :, guide_idx] ndata.X_control = ndata.X_control[:, :, guide_idx] ndata.repguide_mask = ndata.repguide_mask[:, guide_idx] - ndata.a0 = ndata.a0[guide_idx] + if hasattr(ndata, "a0") and self.a0 is not None: + ndata.a0 = ndata.a0[guide_idx] return ndata def transform_data(self, X, n_bins=None): @@ -249,8 +250,8 @@ def get_sample_ids_and_sort(self, screen: be.ReporterScreen, condit_id_col: str) class ReporterScreenData(ScreenData): X_bcmatch: torch.Tensor size_factor_bcmatch: torch.Tensor - X_control_bcmatch: torch.Tensor - size_factor_control_bcmatch: torch.Tensor + X_bcmatch_control: torch.Tensor + size_factor_bcmatch_control: torch.Tensor allele_counts: torch.Tensor allele_counts_control: torch.Tensor a0_bcmatch: torch.Tensor @@ -369,8 +370,6 @@ def __getitem__(self, guide_idx): ndata.allele_counts = ndata.allele_counts[:, :, guide_idx, :] if hasattr(ndata, "a0_allele"): ndata.a0_allele = ndata.a0_allele[guide_idx, :] - if hasattr(ndata, "a0") and self.a0 is not None: - ndata.a0 = ndata.a0[guide_idx] if hasattr(ndata, "pi_a0") and self.pi_a0 is not None: ndata.pi_a0 = ndata.pi_a0[guide_idx] if hasattr(ndata, "a0_bcmatch") and self.a0_bcmatch is not None: @@ -1137,6 +1136,20 @@ def set_bcmatch(self, screen): ) self.a0_bcmatch = torch.as_tensor(a0_bcmatch) + def __getitem__(self, guide_idx): + ndata = super().__getitem__(guide_idx) + if hasattr(ndata, "X_bcmatch"): + ndata.X_bcmatch = ndata.X_bcmatch[:, :, guide_idx] + if hasattr(ndata, "X_bcmatch_masked"): + ndata.X_bcmatch_masked = ndata.X_bcmatch_masked[:, :, guide_idx] + if hasattr(ndata, "X_bcmatch_control"): + ndata.X_bcmatch_control = ndata.X_bcmatch_control[:, :, guide_idx] + if hasattr(ndata, "X_bcmatch_control_masked"): + ndata.X_bcmatch_control_masked = ndata.X_bcmatch_control_masked[ + :, :, guide_idx + ] + return ndata + @dataclass class VariantSortingReporterScreenData(VariantReporterScreenData, SortingScreenData): diff --git a/bean/qc/sample_qc.py b/bean/qc/sample_qc.py index 9f60c14..8b66f90 100644 --- a/bean/qc/sample_qc.py +++ b/bean/qc/sample_qc.py @@ -16,21 +16,31 @@ def plot_guide_edit_rates(bdata, ax=None, figsize=(5, 3), title="", n_bins=30): return ax -def plot_sample_edit_rates( - bdata, ax=None, figsize=(5, 3), title="", agg_method="median" -): +def plot_sample_edit_rates(bdata, ax=None, figsize=(5, 3), title="", agg_method="mean"): + if agg_method == "median": + agg = np.nanmedian + elif agg_method == "mean": + agg = np.nanmean + else: + raise ValueError( + "Invalid aggregation method provided. Please pass one of 'median' or 'mean'." + ) set_sample_edit_rates(bdata, agg_method) if ax is None: fig, ax = plt.subplots(figsize=figsize) for i, sample in enumerate(bdata.samples.index): sns.kdeplot( bdata.layers["edit_rate"][:, i], - label=f"{sample}(median {np.nanmedian(bdata.layers['edit_rate'][:, i]):.2f})", + label=f"{sample}({agg_method} {agg(bdata.layers['edit_rate'][:, i]):.2f})", linestyle=linestyles[(i // 10) % 4], clip=(0, 1), ) sns.kdeplot( - bdata.guides.edit_rate, linewidth=2, c="black", label="Bulk median", clip=(0, 1) + bdata.guides.edit_rate, + linewidth=2, + c="black", + label=f"Bulk {agg_method}", + clip=(0, 1), ) ax.set_xlabel("Editing rate") ax.set_title(title) diff --git a/bean/qc/utils.py b/bean/qc/utils.py index f584783..5c6e51c 100644 --- a/bean/qc/utils.py +++ b/bean/qc/utils.py @@ -63,7 +63,7 @@ def parse_args(): "--condition-label", help="Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time, etc.)", type=str, - default="bin", + default="condition", ) parser.add_argument( "--no-editing", @@ -100,7 +100,7 @@ def parse_args(): ) parser.add_argument( "--edit-rate-thres", - help="Median editing rate threshold per sample to mask out.", + help="Mean editing rate threshold per sample to mask out.", type=float, default=0.1, ) diff --git a/bin/bean-qc b/bin/bean-qc index c83d6f2..1f79ab9 100644 --- a/bin/bean-qc +++ b/bin/bean-qc @@ -35,6 +35,7 @@ def main(): exp_id=args.out_report_prefix, recalculate_edits=args.recalculate_edits, base_edit_data=args.base_edit_data, + ), kernel_name="bean_python3", ) diff --git a/bin/bean-run b/bin/bean-run index 1ab8aac..550d0ec 100644 --- a/bin/bean-run +++ b/bin/bean-run @@ -101,7 +101,7 @@ def main(args, bdata): if "edit_rate" not in ndata.screen.guides.columns: ndata.screen.get_edit_from_allele() ndata.screen.get_edit_mat_from_uns(rel_pos_is_reporter=True) - ndata.screen.get_guide_edit_rate() + ndata.screen.get_guide_edit_rate(unsorted_condition_label=args.control_condition_label) target_info_df = _get_guide_target_info( ndata.screen, args, cols_include=[args.negctrl_col] ) @@ -117,6 +117,7 @@ def main(args, bdata): pd.DataFrame(pd.Series(ndata.edit_index)) .reset_index() .rename(columns={"index": "edit"}), + control_tag=args.control_guide_tag, splice_sites=None if args.splice_site_path is None else splice_site, ) target_info_df["effective_edit_rate"] = _obtain_effective_edit_rate(ndata).cpu() @@ -130,7 +131,6 @@ def main(args, bdata): print("syn idx: ", adj_negctrl_idx) guide_info_df = ndata.screen.guides - del bdata info(f"Running inference for {model_label}...") diff --git a/docs/tutorials/ldl_cds.md b/docs/tutorials/ldl_cds.md new file mode 100644 index 0000000..c3de47a --- /dev/null +++ b/docs/tutorials/ldl_cds.md @@ -0,0 +1,112 @@ +# Tiling sorting screen tutorial +Tiling screen that tiles gRNA densely across locus or multiple loci, selected based on FACS signal quantiles. + + + + + + + + + + +
Library designTiling (gRNAs tile each locus densely)
tiling library design
SelectionCells are sorted based on FACS signal quantiles
variant library design
+ +

+ +## 1. Count gRNA & reporter ([`bean-count-samples`](../../README#bean-count-samples-count-reporter-screen-data)) +``` +screen_id=my_sorting_tiling_screen + +bean-count-samples \ +--input tests/data/sample_list_tiling.csv `# Contains fastq file path; see test file for example.`\ +-b A `# Base A is edited (into G)` \ +-f tests/data/test_guide_info_tiling_chrom.csv `# Contains gRNA metadata; see test file for example.`\ +-o ./ `# Output directory` \ +-r `# Quantify reporter edits` \ +-n ${screen_id} `# ID of the screen` \ +--tiling +``` +Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. + +## 2. QC ([`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data)) +Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) +``` +bean-qc \ + bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \ + -o bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \ + -r qc_report_${screen_id} `# Prefix for QC report` \ + [--tiling] `# Not required if you have passed --tiling in counting step` +``` + + + +If the data does not include reporter editing data, you can provide `--no-editing` flag to omit the editing rate QC. + +## 3. Filter alleles ([`bean-filter`](../../README#bean-filter-filtering-and-optionally-translating-alleles)) +As tiling library doesn't have designated per-gRNA target variant, any base edit observed in reporter may be the candidate variant, while having too many variants with very low editing rate significantly decreases the power. Variants are filtered based on multiple criteria in `bean-fitler`. + +If the screen targets coding sequence, it's beneficial to translate edits into coding varaints whenever possible for better power. For translation, provide `--translate` and one of the following: +``` +[ --translate-gene-name GENE_SYMBOL OR + --translate-genes-list path_to_gene_names_file.txt OR + --translate-fasta gene_exon.fa, OR + --translate-fastas-csv gene_exon_fas.csv] +``` +where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbol uses its MANE transcript (hg38) coordinates of exons. In order to use other reference versions or transcript ID, you'll need to feed in fasta file. See detailed formatting of fasta file [here](../../README#translating-alleles). + +Example allele filtering given we're translating based on MANE transcript exons of multiple gene symbols: + +```bash +bean-filter ./bean_count_${screen_id}_masked.h5ad \ +-o ./bean_count_${screen_id}_alleleFiltered \ +--filter-target-basechange `# Filter based on intended base changes. If -b A was provided in bean-count, filters for A>G edit. If -b C was provided, filters for C>T edit.`\ +--filter-window --edit-start-pos 0 --edit-end-pos 19 `# Filter based on editing window in spacer position within reporter.`\ +--filter-allele-proportion 0.1 --filter-sample-proportion 0.3 `#Filter based on allele proportion larger than 0.1 in at least 0.3 (30%) of the control samples.` \ +--translate --translate-genes-list tests/data/gene_symbols.txt +``` + +Ouptut file `` shows number of alleles per guide and number of guides per variant, where we want high enough values for the latter. See the typical output for dataset with good editing coverage & filtering result [here](../example_filtering_ouptut/). + +## 4. Quantify variant effect ([`bean-run`](../../README#bean-run-quantify-variant-effects)) +By default, `bean-run [sorting,survival] tiling` uses most filtered allele counts table for variant identification and quantification of their effects. **Check [allele filtering output](../example_filtering_ouptut/)** and choose alternative filtered allele counts table if necessary. + +`bean-run` can take 3 run options to quantify editing rate: +1. From **reporter + accessibility** + 1-1. If your gRNA metadata table (`tests/data/test_guide_info.csv` above) included per-gRNA accessibility score, + ``` + bean-run sorting tiling \ + ./bean_count_${screen_id}_alleleFiltered.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --scale-by-acc \ + --accessibility-col accessibility + ``` + 1-2. If your gRNA metadata table (`tests/data/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, + ``` + bean-run sorting tiling \ + ./bean_count_${screen_id}_alleleFiltered.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --scale-by-acc \ + --accessibility-bw accessibility.bw + ``` + +2. From **reporter** + ``` + bean-run sorting tiling \ + ./bean_count_${screen_id}_alleleFiltered.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl + ``` +3. No reporter information, assume the same editing efficiency of all gRNAs. + Use this option if your data don't have editing rate information. + ``` + bean-run sorting tiling \ + ./bean_count_${screen_id}_alleleFiltered.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --uniform-edit + ``` + +See [full argument list](../../README#optional-parameters) to accommodate different input sample & guide metadata columns/values and run options. \ No newline at end of file diff --git a/docs/tutorials/ldl_var.md b/docs/tutorials/ldl_var.md new file mode 100644 index 0000000..263899a --- /dev/null +++ b/docs/tutorials/ldl_var.md @@ -0,0 +1,84 @@ +# Variant sorting screen tutorial +GWAS variant screen with per-variant gRNA tiling design, selected based on FACS signal quantiles. + + + + + + + + + + +
Library designVariant (gRNAs tile each target variant)
variant library design
SelectionCells are sorted based on FACS signal quantiles
variant library design
+ +

+ +## 1. Count gRNA & reporter ([`bean-count-samples`](../../README#bean-count-samples-count-reporter-screen-data)) +``` +screen_id=my_sorting_tiling_screen + +bean-count-samples \ +--input tests/data/sample_list.csv `# Contains fastq file path; see test file for example.`\ +-b A `# Base A is edited (into G)` \ +-f tests/data/test_guide_info.csv `# Contains gRNA metadata; see test file for example.`\ +-o ./ `# Output directory` \ +-r `# Quantify reporter edits` \ +-n ${screen_id} `# ID of the screen to be counted` +``` +Make sure you follow the [input file format](../../README#input-file-format) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`. + +## 2. QC ([`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data)) +Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](../../README#input-file-format), but you can change the parameters with the full argument list of [`bean-qc`](../../README#bean-qc-qc-of-reporter-screen-data). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.) +``` +bean-qc \ + bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \ + -o bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \ + -r qc_report_${screen_id} `# Prefix for QC report` +``` + + + +If the data does not include reporter editing data, you can provide `--no-editing` flag to omit the editing rate QC. + + +## 3. Quantify variant effect ([`bean-run`](../../README#bean-run-quantify-variant-effects)) + +`bean-run` can take 3 run options to quantify editing rate: +1. From **reporter + accessibility** + If your gRNA metadata table (`tests/data/test_guide_info.csv` above) included per-gRNA accessibility score, + ``` + bean-run sorting variant \ + tests/data/bean_count_${screen_id}_masked.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --scale-by-acc \ + --accessibility-col accessibility + ``` + If your gRNA metadata table (`tests/data/test_guide_info.csv` above) included per-gRNA chromosome & position and you have bigWig file with accessibility signal, + ``` + bean-run sorting variant \ + tests/data/bean_count_${screen_id}_masked.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --scale-by-acc \ + --accessibility-bw accessibility.bw + ``` + +2. From **reporter** + ``` + bean-run sorting variant \ + tests/data/bean_count_${screen_id}_masked.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl + ``` +3. No reporter information, assume the same editing efficiency of all gRNAs. + Use this option if your data don't have editing rate information. + ``` + bean-run sorting variant \ + tests/data/bean_count_${screen_id}_masked.h5ad \ + -o tests/test_res/var/ \ + --fit-negctrl \ + --uniform-edit + ``` +See [full argument list](../../README#optional-parameters) to accommodate different input sample & guide metadata columns/values and run options. \ No newline at end of file diff --git a/imgs/sorting_bins@8x.png b/imgs/sorting_bins@8x.png new file mode 100644 index 0000000..609c137 Binary files /dev/null and b/imgs/sorting_bins@8x.png differ diff --git a/notebooks/sample_quality_report.ipynb b/notebooks/sample_quality_report.ipynb index 49d88f1..65a0d92 100644 --- a/notebooks/sample_quality_report.ipynb +++ b/notebooks/sample_quality_report.ipynb @@ -53,7 +53,7 @@ "posctrl_val = \"PosCtrl\"\n", "lfc_thres = -0.1\n", "replicate_label = \"rep\"\n", - "condition_label = \"bin\"\n", + "condition_label = \"condition\"\n", "comp_cond1 = \"top\"\n", "comp_cond2 = \"bot\"\n", "ctrl_cond = \"bulk\"\n", @@ -399,8 +399,8 @@ " bdata.samples.median_corr_X.isnull() | (bdata.samples.median_corr_X < corr_X_thres),\n", " \"mask\",\n", "] = 0\n", - "if \"median_editing_rate\" in bdata.samples.columns.tolist():\n", - " bdata.samples.loc[bdata.samples.median_editing_rate < edit_rate_thres, \"mask\"] = 0\n", + "if \"mean_editing_rate\" in bdata.samples.columns.tolist():\n", + " bdata.samples.loc[bdata.samples.mean_editing_rate < edit_rate_thres, \"mask\"] = 0\n", "if (\n", " isinstance(replicate_label, str)\n", " and len(bdata.samples[replicate_label].unique()) > 1\n", @@ -424,7 +424,6 @@ "rep_n_samples = bdata_filtered.samples.groupby(replicate_label)[\"mask\"].sum()\n", "print(rep_n_samples)\n", "rep_has_too_small_sample = rep_n_samples.loc[rep_n_samples < 2].index.tolist()\n", - "rep_has_too_small_sample\n", "print(\n", " f\"Excluding reps {rep_has_too_small_sample} that has less than 2 samples per replicate.\"\n", ")\n", @@ -441,6 +440,16 @@ "bdata_filtered = bdata_filtered[:, samples_include]" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if isinstance(replicate_label, str) and len(bdata_filtered.samples[replicate_label].unique()) <= 1 or isinstance(replicate_label, list) and len(bdata_filtered.samples[replicate_label].drop_duplicates() <= 1): \n", + " raise ValueError(\"Too small number of replicate left after QC. Check the input data or adjust the QC metric thresholds.\")" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/setup.py b/setup.py index f41178c..4745dcb 100644 --- a/setup.py +++ b/setup.py @@ -8,7 +8,7 @@ setup( name="crispr-bean", - version="0.3.0", + version="0.3.1", python_requires=">=3.8.0", author="Jayoung Ryu", author_email="jayoung_ryu@g.harvard.edu", diff --git a/tests/data/gene_symbols.txt b/tests/data/gene_symbols.txt new file mode 100644 index 0000000..f067e0a --- /dev/null +++ b/tests/data/gene_symbols.txt @@ -0,0 +1,3 @@ +LDLR +SMARCA4 +SPC24 \ No newline at end of file diff --git a/tests/data/sample_list_tiling.csv b/tests/data/sample_list_tiling.csv new file mode 100644 index 0000000..dd89fce --- /dev/null +++ b/tests/data/sample_list_tiling.csv @@ -0,0 +1,7 @@ +R1_fastq,R2_fastq,sample_id,replicate,condition,upper_quantile,lower_quantiles +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,1_bot,1,bot,0.3,0.0 +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,1_bulk,1,bulk,1.0,0.0 +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,1_top,1,top,1.0,0.7 +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,2_bot,2,bot,0.3,0.0 +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,2_bulk,2,bulk,1.0,0.0 +tests/data/test_tiling_R1.fastq,tests/data/test_tiling_R2.fastq,2_top,2,top,1.0,0.7 \ No newline at end of file diff --git a/tests/data/tiling_mini_screen.h5ad b/tests/data/tiling_mini_screen.h5ad index e4f94a0..fbd0696 100644 Binary files a/tests/data/tiling_mini_screen.h5ad and b/tests/data/tiling_mini_screen.h5ad differ diff --git a/tests/data/tiling_mini_screen_missing.h5ad b/tests/data/tiling_mini_screen_missing.h5ad index 5e32a15..af3b2af 100644 Binary files a/tests/data/tiling_mini_screen_missing.h5ad and b/tests/data/tiling_mini_screen_missing.h5ad differ diff --git a/tests/data/var_mini_screen.h5ad b/tests/data/var_mini_screen.h5ad index 74eb17e..d8b36bb 100644 Binary files a/tests/data/var_mini_screen.h5ad and b/tests/data/var_mini_screen.h5ad differ diff --git a/tests/data/var_mini_screen_missing.h5ad b/tests/data/var_mini_screen_missing.h5ad index 4e58ffe..b038822 100644 Binary files a/tests/data/var_mini_screen_missing.h5ad and b/tests/data/var_mini_screen_missing.h5ad differ diff --git a/tests/test_count.py b/tests/test_count.py index 882f4dc..b6e20f5 100644 --- a/tests/test_count.py +++ b/tests/test_count.py @@ -42,6 +42,19 @@ def test_count_samples(): @pytest.mark.order(5) +def test_count_samples_tiling(): + cmd = "bean-count-samples --input tests/data/sample_list_tiling.csv -b A -f tests/data/test_guide_info_tiling_chrom.csv -o tests/test_res/ -r" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(6) def test_count_chroms(): cmd = "bean-count --R1 tests/data/test_tiling_R1.fastq --R2 tests/data/test_tiling_R2.fastq -b A -f tests/data/test_guide_info_tiling_chrom.csv -o tests/test_res/ -r" try: diff --git a/tests/test_filter.py b/tests/test_filter.py index f98db3b..2baaf55 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -2,7 +2,7 @@ import subprocess -@pytest.mark.order(10) +@pytest.mark.order(11) def test_filter_varscreen(): cmd = "bean-filter tests/data/var_mini_screen_masked.h5ad -o tests/data/var_mini_screen_annotated -s 0 -e 19 -w -b -t -ap 0.1 -sp 0.3" try: @@ -15,7 +15,7 @@ def test_filter_varscreen(): raise exc -@pytest.mark.order(11) +@pytest.mark.order(12) def test_filter_tiling_screen(): cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_annotated -s 0 -e 19 -w -b -t -ap 0.1 -sp 0.3" try: @@ -28,9 +28,9 @@ def test_filter_tiling_screen(): raise exc -@pytest.mark.order(12) +@pytest.mark.order(13) def test_filter_tiling_screen_translate_genename(): - cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_annotated_wrong -s 0 -e 19 -w -b -t -ap 0.1 -sp 0.3 --translate --translate-gene LDLR" + cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_annotated_wrong -s 0 -e 19 -w -b -ap 0.1 -sp 0.3 --translate --translate-gene LDLR" try: subprocess.check_output( cmd, @@ -41,9 +41,22 @@ def test_filter_tiling_screen_translate_genename(): raise exc -@pytest.mark.order(13) +@pytest.mark.order(14) def test_filter_tiling_screen_translate_fasta(): - cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_annotated_wrong -s 0 -e 19 -w -b -t -ap 0.1 -sp 0.3 --translate --translate-fasta tests/data/ldlr_exons.fa" + cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_annotated_wrong -s 0 -e 19 -w -b -ap 0.1 -sp 0.3 --translate --translate-fasta tests/data/ldlr_exons.fa" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(15) +def test_filter_tiling_screen_translate_genenames(): + cmd = "bean-filter tests/data/tiling_mini_screen_masked.h5ad -o tests/data/tiling_mini_screen_alleleFiltered -s 0 -e 19 -w -b -t -ap 0.1 -sp 0.3 --translate --translate-genes-list tests/data/gene_symbols.txt" try: subprocess.check_output( cmd, diff --git a/tests/test_qc.py b/tests/test_qc.py index 5af73e5..4ceb76e 100644 --- a/tests/test_qc.py +++ b/tests/test_qc.py @@ -2,7 +2,7 @@ import subprocess -@pytest.mark.order(6) +@pytest.mark.order(7) def test_qc(): cmd = "bean-qc tests/data/var_mini_screen.h5ad -o tests/data/var_mini_screen_masked.h5ad -r tests/test_res/qc_report_var_mini_screen --count-correlation-thres 0.6" try: @@ -15,7 +15,7 @@ def test_qc(): raise exc -@pytest.mark.order(7) +@pytest.mark.order(8) def test_qc_tiling(): cmd = "bean-qc tests/data/tiling_mini_screen.h5ad -o tests/data/tiling_mini_screen_masked.h5ad -r tests/test_res/qc_report_tiling_mini_screen --count-correlation-thres 0.6 --posctrl-col ''" try: @@ -28,27 +28,27 @@ def test_qc_tiling(): raise exc -@pytest.mark.order(8) +@pytest.mark.order(9) def test_dummy_insertion_varscreen(): cmd = "bean-qc tests/data/var_mini_screen_missing.h5ad -o tests/data/var_mini_screen_missing_masked.h5ad -r tests/test_res/qc_report_var_mini_screen_missing --count-correlation-thres 0.6" try: subprocess.check_output( - cmd, - shell=True, - universal_newlines=True, + cmd, shell=True, universal_newlines=True, stderr=subprocess.STDOUT ) + raise ValueError("Filtering should fail with too small number of replicates.") except subprocess.CalledProcessError as exc: - raise exc + if "Too small number of replicate left after QC" not in exc.output: + raise exc -@pytest.mark.order(9) +@pytest.mark.order(10) def test_dummy_insertion_tilingscreen(): cmd = "bean-qc tests/data/tiling_mini_screen_missing.h5ad -o tests/data/tiling_mini_screen_missing_masked.h5ad -r tests/test_res/qc_report_tiling_mini_screen_missing --count-correlation-thres 0.6 --posctrl-col ''" try: subprocess.check_output( - cmd, - shell=True, - universal_newlines=True, + cmd, shell=True, universal_newlines=True, stderr=subprocess.STDOUT ) + raise ValueError("Filtering should fail with too small number of replicates.") except subprocess.CalledProcessError as exc: - raise exc + if "Too small number of replicate left after QC" not in exc.output: + raise exc diff --git a/tests/test_run.py b/tests/test_run.py index baa0a80..1455707 100644 --- a/tests/test_run.py +++ b/tests/test_run.py @@ -2,9 +2,9 @@ import subprocess -@pytest.mark.order(13) +@pytest.mark.order(16) def test_run_variant_wacc(): - cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal_chr6.bw -o tests/test_res/var/ --repguide-mask None" + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal_chr6.bw -o tests/test_res/var/ --repguide-mask None --n-iter 10" try: subprocess.check_output( cmd, @@ -15,9 +15,9 @@ def test_run_variant_wacc(): raise exc -@pytest.mark.order(14) +@pytest.mark.order(17) def test_run_variant_noacc(): - cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ " + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ --n-iter 10" try: subprocess.check_output( cmd, @@ -28,9 +28,9 @@ def test_run_variant_noacc(): raise exc -@pytest.mark.order(15) +@pytest.mark.order(18) def test_run_variant_wo_negctrl_uniform(): - cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ --uniform-edit " + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ --uniform-edit --n-iter 10" try: subprocess.check_output( cmd, @@ -41,9 +41,48 @@ def test_run_variant_wo_negctrl_uniform(): raise exc -@pytest.mark.order(16) +@pytest.mark.order(19) +def test_run_variant_wacc_negctrl(): + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal_chr6.bw -o tests/test_res/var/ --repguide-mask None --n-iter 10 --fit-negctrl " + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(20) +def test_run_variant_noacc_negctrl(): + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ --fit-negctrl --n-iter 10" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(21) +def test_run_variant_uniform_negctrl(): + cmd = "bean-run sorting variant tests/data/var_mini_screen_annotated.h5ad -o tests/test_res/var/ --uniform-edit --fit-negctrl --n-iter 10" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(22) def test_run_tiling_wo_negctrl(): - cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal.bw -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None" + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal.bw -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None --n-iter 10" try: subprocess.check_output( cmd, @@ -54,9 +93,9 @@ def test_run_tiling_wo_negctrl(): raise exc -@pytest.mark.order(17) +@pytest.mark.order(23) def test_run_tiling_with_wo_negctrl_noacc(): - cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None" + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None --n-iter 10" try: subprocess.check_output( cmd, @@ -67,9 +106,35 @@ def test_run_tiling_with_wo_negctrl_noacc(): raise exc -@pytest.mark.order(18) +@pytest.mark.order(23) def test_run_tiling_with_wo_negctrl_uniform(): - cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --uniform-edit --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None" + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --uniform-edit --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --control-guide-tag None --repguide-mask None --n-iter 10" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(24) +def test_run_tiling_negctrl(): + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad --scale-by-acc --acc-bw-path tests/data/accessibility_signal.bw -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --fit-negctrl --negctrl-col strand --negctrl-col-value neg --control-guide-tag neg --repguide-mask None --n-iter 10" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +@pytest.mark.order(25) +def test_run_tiling_with_negctrl_noacc(): + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --fit-negctrl --negctrl-col strand --negctrl-col-value neg --control-guide-tag neg --repguide-mask None --n-iter 10" try: subprocess.check_output( cmd, @@ -78,3 +143,19 @@ def test_run_tiling_with_wo_negctrl_uniform(): ) except subprocess.CalledProcessError as exc: raise exc + + +@pytest.mark.order(26) +def test_run_tiling_with_negctrl_uniform(): + cmd = "bean-run sorting tiling tests/data/tiling_mini_screen_annotated.h5ad -o tests/test_res/tiling/ --uniform-edit --allele-df-key allele_counts_spacer_0_19_A.G_translated_prop0.1_0.3 --fit-negctrl --negctrl-col strand --negctrl-col-value neg --control-guide-tag neg --repguide-mask None --n-iter 10" + try: + subprocess.check_output( + cmd, + shell=True, + universal_newlines=True, + ) + except subprocess.CalledProcessError as exc: + raise exc + + +# Add fit_negctrl examples