-
Notifications
You must be signed in to change notification settings - Fork 1
/
CITATION.cff
51 lines (51 loc) · 2.68 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
cff-version: 1.2.0
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
authors:
- family-names: "Evans"
given-names: "Benjamin"
orcid: "https://orcid.org/0000-0002-1734-6070"
- family-names: "Slowinski"
given-names: "Piotr"
orcid: "https://orcid.org/0000-0002-6612-9902"
title: "Distribution Proportion Estimation"
version: 1.0.0
doi: 10.5281/zenodo.5512651
date-released: 2021-09-16
url: "https://github.com/bdevans/DPE"
preferred-citation:
type: article
authors:
- family-names: "Evans"
given-names: "Benjamin D."
orcid: "https://orcid.org/0000-0002-1734-6070"
- family-names: "Slowinski"
given-names: "Piotr"
orcid: "https://orcid.org/0000-0002-6612-9902"
- family-names: "Hattersley"
given-names: "Andrew T."
- family-names: "Jones"
given-names: "Samuel E."
orcid: "http://orcid.org/0000-0003-0153-922X"
- family-names: "Sharp"
given-names: "Seth"
- family-names: "Kimmitt"
given-names: "Robert A."
- family-names: "Weedon"
given-names: "Michael N."
- family-names: "Oram"
given-names: "Richard A."
- family-names: "Tsaneva-Atanasova"
given-names: "Krasimira"
orcid: "http://orcid.org/0000-0002-6294-7051"
- family-names: "Thomas"
given-names: "Nicholas J."
doi: "10.1038/s41467-021-26501-7"
journal: "Nature Communications"
month: 11
start: 6441 # First page number
end: 6452 # Last page number
title: "Estimating disease prevalence in large datasets using genetic risk scores"
issue: 1
volume: 12
year: 2021
abstract: "Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria."