-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProblem_statement
52 lines (35 loc) · 2.09 KB
/
Problem_statement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#Mediplexis – Drop-off Estimator
A pharma company Mediplexis, based in the United States, has several products in the various therapy areas. For one of their popular drug addressing 4 conditions. The company has observed that lot of patients are dropping-off the therapy, the company wants to determine feasibility if people who are ging to drop can be detected at early-stage.
Mediplexis wants to run a POC with ZS to:
Create a prediction algorithms to determine the feasibility if people who are going to drop can be detected at early-stage
based on set of metrics related to early indicators for patients which they have anonymized and provided to ZS for analysis.
The company classify a patient into 3 categories:
Persistent (P)
Dose Stretcher (DS)
Drop-off (DO)
Datasets
The participants would be provided with the following datasets:
train.csv
test.csv
profile.csv
submission_format.csv
A valid submission has the following format:
PID PID_State
5001 DO
5002 DS
5003 P
5004 DO
5005 DO
5006 DS
5007 P
5008 DS
5009 DO
Evaluation Metric:
Let Tp be the true positives, Fp be the false positives, Tn be the true negatives and Fn be the false negatives. Now we define precision and recall:
Misclassification Error Rate is the selected evaluation metric for the challenge:
Pi : Predicted class for each patient
Ai : Actual class for each patient (not provided to you for test dataset)
n : Total number of patients in the test dataset
During the competition, only a subset of the test data set will be used for evaluating submissions. The subset chosen will be the same for each participant. However, the final standings will be evaluated against the remaining subset of the test data.
Ranking
During the contest, we will evaluate the Misclassification error rate of the prediction model on 40% of pre-sampled dataset. After the contest, we'll perform evaluation on the remaining 60% of the dataset. This final evaluation will only be performed on your last uploaded output file (i.e., most recently uploaded), so be sure that your final submission is the best output file (i.e., the submission file having the largest score).