-
Notifications
You must be signed in to change notification settings - Fork 46
Tableau API
There are two API endpoints. Both use:
- API keys: These are different from the API keys used to download data. Generate an API key by going to the Manage Credentials page, under "New Credentials", and copying the Access Key and Secret Key that appear at the top of the page. The Secret Key will never be displayed again, so make sure you copy it somewhere!
-
Study IDs: The way you reference a study (
<study_object_id>
) is the 24-character random string visible on the View Study page, right under the study's name.
Path: /api/v0/studies/<study_object_id>/summary-statistics/daily/wdc
This is designed to only be accessed from Tableau.
More documentation needed on how to set that up.
Path: /api/v0/studies/<study_object_id>/summary-statistics/daily
You must pass your Access Key and Secret Key in via the headers X-Access-Key-Id
and X-Access-Key-Secret
.
You can call this endpoint using cURL. For the studies.beiwe.org deployment, here's how you'd pull all summary statistics from an entire study:
curl "https://studies.beiwe.org/api/v0/studies/<study_object_id>/summary-statistics/daily" -H "X-Access-Key-Id: abcd..." -H "X-Access-Key-Secret: efgh..."
You can filter by date, participant ID, and which fields you want. This will make the query run faster. If the API query takes longer than 60 seconds to run, you'll get a 504 timeout error instead of a response. Add filters as GET parameters to the URL, using standard URL parameter format (add the first param to the URL like this: ?<parameter_name>=<parameter_value>
, and all subsequent parameters like this: &<parameter_name>=<parameter_value>
.
-
Start date:
start_date
. Must be in YYYY-MM-DD format. Example:?start_date=2021-03-15
-
End date:
end_date
. Must be in YYYY-MM-DD format. Example:?end_date=2021-04-01
-
Participant IDs:
participant_ids
. Comma-separated list, with no other characters wrapping the list or each item. Example for two participant IDs:?participant_ids=ouq7r382,r3h9qp2o
. -
Fields:
fields
. Comma-separated list. Example:&fields=date,participant_id,accelerometer_bytes,distance_from_home
Example usage: let's say we want to get data from the study with ID "R69zae1Y7Lw6yuwVUR4BOALY", for March 15, 2021 through April 1, 2021, for participant "ouq7r382", and we only want the values of "accelerometer_bytes" and "distance_from_home". Here's how to make that API call with cURL:
curl "https://studies.beiwe.org/api/v0/studies/R69zae1Y7Lw6yuwVUR4BOALY/summary-statistics/daily?start_date=2021-03-15&end_date=2021-04-01&participant_ids=ouq7r382&fields=date,participant_id,accelerometer_bytes,distance_from_home" -H "X-Access-Key-Id: abcd..." -H "X-Access-Key-Secret: efgh..."
For complete documentation on how to use the API endpoint, copy the file beiwe-backend/api/tableau_api/spec.yaml
and paste it into any OpenAPI viewer, like
Swagger Editor.
The JSON API and the Tableau WDC API should return the same type of data; the WDC API is really just a wrapper on the JSON API.
The JSON API returns a list of dictionaries, where each dictionary represents the summary statistics for one participant on one day. If a dictionary doesn't get returned, it means there's no summary statistics data for that participant on that day.
If a summary statistic is null
, that means it's either 0
or not calculated.
There are two types of summary statistics:
- Forest-generated statistics: these are calculated by running Forest, and include interesting statistics like "maximum distance from home" and "number of incoming phone calls".
-
Data quantity/volume statistics: these are calculated automatically, whether or not Forest is run. These tell you the number of bytes of each type of data, for each day.
- Each statistic is the integer number of bytes of decrypted data.
- A "day" is defined as midnight to midnight in the study's time zone. So for US Eastern Daylight Time, that'll be 4am UTC one day to 4am UTC the next. (When you download the data, it's all timestamped in UTC.)
- This is based on the quantity of batched data, not _raw_data. So if data has been uploaded from the phone but not yet batched (maybe the data processing/batching script crashed), it won't be reflected in this total. But as soon as data gets batched, it should show up in the data quantity statistics.