Skip to content

Commit

Permalink
Change the default behavior of bean-qc to not remove the replicates e…
Browse files Browse the repository at this point in the history
…ven when it has bad replicates
  • Loading branch information
jykr committed Mar 25, 2024
1 parent 4316376 commit c858819
Show file tree
Hide file tree
Showing 8 changed files with 164 additions and 101 deletions.
105 changes: 73 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,9 +206,18 @@ Above command produces `prefix_editing_preference.[html,ipynb]` as editing prefe
## `bean-qc`: QC of reporter screen data
```bash
bean-qc \
my_sorting_screen.h5ad `# Input ReporterScreen .h5ad file path` \
my_sorting_screen.h5ad `# Input ReporterScreen .h5ad file path` \
-o my_sorting_screen_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r qc_report_my_sorting_screen `# Prefix for QC report`
-r qc_report_my_sorting_screen `# Prefix for QC report` \

# Inspect the output qc_report_my_sorting_screen.html to tweak QC threshold

bean-qc \
my_sorting_screen.h5ad \
-o my_sorting_screen_masked.h5ad \
-r qc_report_my_sorting_screen \
#[--count-correlation-thres 0.7 ...]\
-b
```

`bean-qc` supports following quality control and masks samples with low quality. Specifically:
Expand All @@ -229,36 +238,68 @@ Above command produces


#### Additional Parameters
* `--tiling` (default: `None`): If set as `True` or `False`, it sets the screen object to be tiling (`True`) or variant (`False`)-targeting screen when calculating editing rate.
* `--replicate-label` (default: `"rep"`): Label of column in `bdata.samples` that describes replicate ID.
* `--condition-label` (default: `"condition"`)": Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time, etc.).
* `--sample-covariates` (default: `None`): Comma-separated list of column names in `bdata.samples` that describes non-selective experimental condition (drug treatment, etc.). The values in the `bdata.samples` should NOT contain `.`.
* `--no-editing` (default: `False`): Ignore QC about editing. Can be used for QC of other editing modalities.

Editing rate quantification
* `--ctrl-cond` (default: `"bulk"`): Value in of column in `ReporterScreen.samples[condition_label]` where guide-level editing rate to be calculated

Editing rate is calculated with following parameters in variant screens:
* `--target-pos-col` (default: `"target_pos"`): Target position column in `bdata.guides` specifying target edit position in reporter.

For tiling screens:
* `--rel-pos-is-reporter` (default: `False`): Specifies whether `edit_start_pos` and `edit_end_pos` are relative to reporter position. If `False`, those are relative to spacer position.
* `--edit-start-pos` (default: `2`): Edit start position to quantify editing rate on, 0-based inclusive.
* `--edit-end-pos` (default: `7`): Edit end position to quantify editing rate on, 0-based exclusive.

LFC of positive controls
* `--posctrl-col` (default: `group`): Column name in .h5ad.guides DataFrame that specifies guide category.
* `--posctrl-val` (default: `PosCtrl`): Value in .h5ad.guides[`posctrl_col`] that specifies guide will be used as the positive control in calculating log fold change.
* `--lfc-conds` (default: `"top,bot"`): Values in of column in `ReporterScreen.samples[condition_label]` for LFC will be calculated between, delimited by comma


Sample filtering thresholds
* `--count-correlation-thres` (default: `0.7`): Threshold of guide count correlation to mask out.
* `--edit-rate-thres` (default: `0.1`): Mean editing rate threshold per sample to mask out.
* `--lfc-thres` (default: `0.1`): Positive guides' correlation threshold to filter out.

Other
* `--recalculate-edits` (default: `False`): Even when `ReporterScreen.layers['edit_count']` exists, recalculate the edit counts from `ReporterScreen.uns['allele_count']`."
##### Optional arguments:
* `-o OUT_SCREEN_PATH`, `--out-screen-path OUT_SCREEN_PATH`
Path where quality-filtered ReporterScreen object to be written to
* `-r OUT_REPORT_PREFIX`, `--out-report-prefix OUT_REPORT_PREFIX`
Output prefix of qc report (prefix.html, prefix.ipynb)

##### QC thresholds:
* `--count-correlation-thres COUNT_CORRELATION_THRES`
Correlation threshold to mask out.
* `--edit-rate-thres EDIT_RATE_THRES`
Mean editing rate threshold per sample to mask out.
* `--lfc-thres LFC_THRES`
Positive guides' correlation threshold to filter out.

##### Run options:
* `-b`, `--remove-bad-replicates`
Remove replicates with at least two of its samples meet the QC threshold (bean-run does not support having only one sorting bin sample for a replicate).
* `-i`, `--ignore-missing-samples`
If the flag is not provided, if the ReporterScreen object does not contain all condiitons for
each replicate, make fake empty samples. If the flag is provided, don't add dummy samples.
* `--no-editing` Ignore QC about editing. Can be used for QC of other editing modalities.
* `--dont-recalculate-edits`
When ReporterScreen.layers['edit_count'] exists, do not recalculate the edit counts from
ReporterScreen.uns['allele_count'].

##### Input `.h5ad` formatting:
Note that these arguements will change the way the QC metrics are calculated for guides, samples, or replicates.
* `--tiling TILING` Specify that the guide library is tiling library without 'n guides per target' design
* `--replicate-label REPLICATE_LABEL`
Label of column in `bdata.samples` that describes replicate ID.
* `--sample-covariates SAMPLE_COVARIATES`
Comma-separated list of column names in `bdata.samples` that describes non-selective
experimental condition. (drug treatment, etc.)
* `--condition-label CONDITION_LABEL`
Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time,
etc.)
###### Editing rate calculation
* `--ctrl-cond CTRL_COND`
Values in of column in `ReporterScreen.samples[condition_label]` for guide-level editing rate
to be calculated
* `--rel-pos-is-reporter`
Specifies whether `edit_start_pos` and `edit_end_pos` are relative to reporter position. If
`False`, those are relative to spacer position.
Editing rate is calculated with following parameters in
* Variant screens:
* `--target-pos-col TARGET_POS_COL`
Target position column in `bdata.guides` specifying target edit position in reporter
* tiling screens:
* `--edit-start-pos EDIT_START_POS`
Edit start position to quantify editing rate on, 0-based inclusive.
* `--edit-end-pos EDIT_END_POS`
Edit end position to quantify editing rate on, 0-based exclusive.
###### LFC of positive controls
* `--posctrl-col POSCTRL_COL`
Column name in ReporterScreen.guides DataFrame that specifies guide category. To use all
gRNAs, feed empty string ''.
* `--posctrl-val POSCTRL_VAL`
Value in ReporterScreen.guides[`posctrl_col`] that specifies guide will be used as the
positive control in calculating log fold change.
* `--lfc-conds LFC_CONDS`
Values in of column in `ReporterScreen.samples[condition_label]` for LFC will be calculated
between, delimited by comma

<br/><br/>

Expand Down
6 changes: 5 additions & 1 deletion bean/framework/Edit.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,11 @@ def get_range(self):
min(edit.pos for edit in self.edits),
max(edit.pos for edit in self.edits),
)


def set_uid(self, uid):
self.edits = {edit.set_uid(uid) for edit in self.edits}
return self

def get_uid(self):
uid = None
if (
Expand Down
Binary file not shown.
2 changes: 1 addition & 1 deletion bean/model/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def parse_args():
parser.add_argument(
"--allele-df-key",
type=str,
default=None,
default="allele_counts",
help="screen.uns[allele_df_key] will be used as the allele count.",
)
parser.add_argument(
Expand Down
106 changes: 61 additions & 45 deletions bean/qc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,122 +23,138 @@ def parse_args():
parser.add_argument(
"bdata_path", help="Path to the ReporterScreen object to run QC on", type=str
)
thres_parser = parser.add_argument_group("QC thresholds")
run_parser = parser.add_argument_group("Run options")
input_parser = parser.add_argument_group("Input .h5ad formatting")

thres_parser.add_argument(
"--count-correlation-thres",
help="Correlation threshold to mask out.",
type=float,
default=0.7,
)
thres_parser.add_argument(
"--edit-rate-thres",
help="Mean editing rate threshold per sample to mask out.",
type=float,
default=0.1,
)
thres_parser.add_argument(
"--lfc-thres",
help="Positive guides' correlation threshold to filter out.",
type=float,
default=-0.1,
)

parser.add_argument(
"-o",
"--out-screen-path",
help="Path where quality-filtered ReporterScreen object to be written to",
type=str,
)
parser.add_argument(
"-r",
"--out-report-prefix",
help="Output prefix of qc report (prefix.html, prefix.ipynb)",
type=str,
)

run_parser.add_argument(
"-b", "--remove-bad-replicates",
help="Remove replicates with at least two of its samples meet the QC threshold.",
action="store_true",
)
run_parser.add_argument(
"-i",
"--ignore-missing-samples",
help="If the flag is not provided, if the ReporterScreen object does not contain all condiitons for each replicate, make fake empty samples. If the flag is provided, don't add dummy samples.",
action="store_true",
)
parser.add_argument(
"-r",
"--out-report-prefix",
help="Output prefix of qc report (prefix.html, prefix.ipynb)",
type=str,
run_parser.add_argument(
"--no-editing",
help="Ignore QC about editing. Can be used for QC of other editing modalities.",
action="store_true",
)
parser.add_argument(
run_parser.add_argument(
"--dont-recalculate-edits",
help="When ReporterScreen.layers['edit_count'] exists, do not recalculate the edit counts from ReporterScreen.uns['allele_count'].",
action="store_true",
)

input_parser.add_argument(
"--tiling",
dest="tiling",
type=lambda x: bool(distutils.util.strtobool(x)),
help="Specify that the guide library is tiling library without 'n guides per target' design",
)
parser.add_argument(
input_parser.add_argument(
"--replicate-label",
help="Label of column in `bdata.samples` that describes replicate ID.",
type=str,
default="rep",
)
parser.add_argument(
input_parser.add_argument(
"--sample-covariates",
help="Comma-separated list of column names in `bdata.samples` that describes non-selective experimental condition. (drug treatment, etc.)",
type=str,
default=None,
)
parser.add_argument(
input_parser.add_argument(
"--condition-label",
help="Label of column in `bdata.samples` that describes experimental condition. (sorting bin, time, etc.)",
type=str,
default="condition",
)
parser.add_argument(
"--no-editing",
help="Ignore QC about editing. Can be used for QC of other editing modalities.",
action="store_true",
)
parser.add_argument(
input_parser.add_argument(
"--target-pos-col",
help="Target position column in `bdata.guides` specifying target edit position in reporter",
type=str,
default="target_pos",
)
parser.add_argument(
input_parser.add_argument(
"--rel-pos-is-reporter",
help="Specifies whether `edit_start_pos` and `edit_end_pos` are relative to reporter position. If `False`, those are relative to spacer position.",
action="store_true",
default=False,
)
parser.add_argument(
input_parser.add_argument(
"--edit-start-pos",
help="Edit start position to quantify editing rate on, 0-based inclusive.",
default=2,
)
parser.add_argument(
input_parser.add_argument(
"--edit-end-pos",
help="Edit end position to quantify editing rate on, 0-based exclusive.",
default=7,
)
parser.add_argument(
"--count-correlation-thres",
help="Correlation threshold to mask out.",
type=float,
default=0.7,
)
parser.add_argument(
"--edit-rate-thres",
help="Mean editing rate threshold per sample to mask out.",
type=float,
default=0.1,
)
parser.add_argument(

input_parser.add_argument(
"--posctrl-col",
help="Column name in ReporterScreen.guides DataFrame that specifies guide category. To use all gRNAs, feed empty string ''.",
type=str,
default="target_group",
)
parser.add_argument(
input_parser.add_argument(
"--posctrl-val",
help="Value in ReporterScreen.guides[`posctrl_col`] that specifies guide will be used as the positive control in calculating log fold change.",
type=str,
default="PosCtrl",
)
parser.add_argument(
"--lfc-thres",
help="Positive guides' correlation threshold to filter out.",
type=float,
default=-0.1,
)
parser.add_argument(

input_parser.add_argument(
"--lfc-conds",
help="Values in of column in `ReporterScreen.samples[condition_label]` for LFC will be calculated between, delimited by comma",
type=str,
default="top,bot",
)
parser.add_argument(
input_parser.add_argument(
"--ctrl-cond",
help="Values in of column in `ReporterScreen.samples[condition_label]` for guide-level editing rate to be calculated",
type=str,
default="bulk",
)
parser.add_argument(
"--recalculate-edits",
help="Even when ReporterScreen.layers['edit_count'] exists, recalculate the edit counts from ReporterScreen.uns['allele_count'].",
action="store_true",
)


args = parser.parse_args()
if args.out_screen_path is None:
args.out_screen_path = f"{args.bdata_path.rsplit('.h5ad', 1)[0]}.filtered.h5ad"
Expand Down
4 changes: 2 additions & 2 deletions bin/bean-qc
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ def main():
comp_cond2=args.lfc_cond2,
ctrl_cond=args.ctrl_cond,
exp_id=args.out_report_prefix,
recalculate_edits=args.recalculate_edits,
recalculate_edits=~args.dont_recalculate_edits,
base_edit_data=args.base_edit_data,

remove_bad_replicates=args.remove_bad_replicates,
),
kernel_name="bean_python3",
)
Expand Down
Loading

0 comments on commit c858819

Please sign in to comment.