Skip to content

Commit

Permalink
Merge pull request #300 from NCI-CGR/287-dynamically-adjust-minimum-m…
Browse files Browse the repository at this point in the history
…af-for-hwe-calculation-and-add-option-to-set-hwe-p-value-threshold

287 dynamically adjust minimum maf for hwe calculation and add option to set hwe p value threshold
  • Loading branch information
kliao12 authored Jun 12, 2024
2 parents 8913152 + 14a7b74 commit 719b6c0
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 9 deletions.
6 changes: 3 additions & 3 deletions docs/sub_workflows/subject_qc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,9 @@ Hardy Weinberg

- ``subject_level/<population>/controls_unrelated_maf{maf_for_hwe}_snps_autosomes.hwe``

Here we pull out only control subjects.
We then calculate Hardy Weinberg Equilibrium.
We use only controls, but cases may have SNPs that are out of HWE.
Here we calculate Hardy Weinberg Equilibrium (HWE) and produce plots for all populations (from graf-pop) that have >50 controls. Populations with <50 controls but > 50 cases + control are also run.

MAF threshold for computing HWE is the minimum(software_params.maf_for_hwe, sqrt(5/n)) where n is the number of controls or cases + controls (if number of controls < 50).

Subject QC Summary Tables
-------------------------
Expand Down
40 changes: 34 additions & 6 deletions src/cgr_gwas_qc/workflow/sub_workflows/subject_qc.smk
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ from more_itertools import flatten
from cgr_gwas_qc import load_config
from cgr_gwas_qc.workflow.scripts import subject_qc_table
import shutil
import math

cfg = load_config()

Expand Down Expand Up @@ -654,11 +655,12 @@ def _get_controls(wildcards):
]


rule population_controls_Subject_IDs:
checkpoint population_controls_Subject_IDs:
input:
rules.subject_qc_table.output[0],
output:
"subject_level/{population}/controls.txt",
"subject_level/{population}/hwe_maf_thres.txt",
params:
"subject_level/{population}/test_controls.txt",
"subject_level/{population}/test_case.txt",
Expand Down Expand Up @@ -722,6 +724,15 @@ rule population_controls_Subject_IDs:
"threshold, using control and case samples for HWE",
)
shutil.copyfile(params[2], output[0])
with open(output[1], "w") as f:
f.write(
str(
round(
min(cfg.config.software_params.maf_for_hwe, math.sqrt(5 / num_lines2)),
2,
)
)
)
else:
print(
num_lines0,
Expand All @@ -730,6 +741,15 @@ rule population_controls_Subject_IDs:
"threshold for HWE",
)
shutil.copyfile(params[0], output[0])
with open(output[1], "w") as f:
f.write(
str(
round(
min(cfg.config.software_params.maf_for_hwe, math.sqrt(5 / num_lines0)),
2,
)
)
)


use rule keep_ids from plink as pull_population_unrelated_controls with:
Expand Down Expand Up @@ -817,13 +837,21 @@ use rule hwe from plink as population_level_unrelated_controls_hwe with:
"{population}_controls_filter"


def compute_maf(wildcards):
with open(
checkpoints.population_controls_Subject_IDs.get(population=wildcards.population).output[1]
) as f:
value = f.read().strip()
return expand(
rules.population_level_unrelated_controls_hwe.output[0],
maf=value,
allow_missing=True,
)


rule plot_hwe:
input:
expand(
rules.population_level_unrelated_controls_hwe.output[0],
maf=cfg.config.software_params.maf_for_hwe,
allow_missing=True,
),
compute_maf,
params:
population="{population}",
output:
Expand Down

0 comments on commit 719b6c0

Please sign in to comment.