Update survey with a description of the schema.

digitraceslab · Sep 30, 2024 · 9aac8c4 · 9aac8c4
1 parent 985d9a3
commit 9aac8c4
Showing 1 changed file with 2 additions and 6 deletions.
diff --git a/docs/user_guide/preprocessing/survey.ipynb b/docs/user_guide/preprocessing/survey.ipynb
@@ -7,13 +7,9 @@
    "source": [
     "# Survey Data\n",
     "\n",
-    "Surveys consist of columns\n",
-    "* `id` for the question identifier\n",
-    "* `answer` for the answer of the question\n",
-    "* `q` which is the text of the question presented to the user (optional)\n",
-    "* As usual, the DataFrame index is the timestamp of the answer.  It is the convention that all responses in a one single survey instance have the same timestamp, and this is used to link surveys together.\n",
+    "Survey single survey row can contain answers to multiple questions. The survey dataframe should contain a `user` column containing the user ID. In addition, multiple columns with anwers to survey questions should be provided (see example below for clarification). Each column title represents the question and the value on a given row represents the answer. As usual, the DataFrame index is the timestamp of the answer.\n",
     "\n",
-    "The raw on-disk format is \"long\", that is, one row per answer, which is \"tidy data\".  This provides the most flexible format, but often you need to do other transformations.\n"
+    "Question titles should be converted into a string with a questionaire prefix and a question number. For example, the first question in \"PHQ2\" would be \"PHQ2_1\". We provide utilities for converting some common questionaires to this format, as shown below. Similarly, answers should be converted into numerical values."
    ]
   },
   {