-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the Github Wiki for the WillisAPI Client, the official python interface for interacting with Brooklyn Health’s WillisAPI. Here, you will find setup and user instructions.
Before you can get started with WillisAPI, you will need to have an account with Brooklyn Health. If you don’t have one just yet and are interested in using the API, be sure to [email protected].
Before you begin, we recommend you create a virtual environment for the WillisAPI Client.
pip install virtualenv
virtualenv willisapi_client --python=3.9
source willisapi_client/bin/activate
Note: For stable behavior, it's recommended to use any Python version above 3.9.
To install the client, simply run:
pip install willisapi_client
Then, enter a python environment. From there, you’ll want to run:
import willisapi_client as willisapi
See below for instructions on login, upload, and download.
Once you are in a python environment, use the following function to log in:
key, expiration = willisapi.login('username', 'password')
Your username is your email. Enter the same password used during account setup.
The function will return:
-
key
: This is the access key that will be required to use the API. It is valid for 24 hours. -
expiration
: The date/time when the key will expire, in the format YYYY-MM-DD-HH-MM-SS.
To upload data, use the following function:
summary = willisapi.upload(key,'data.csv')
Enter the key
that was saved during login.
data.csv
is where information on the data to be uploaded will be given.
It is a CSV organized in the following format:
project_name
|
pt_id_external
|
age
|
sex
|
race
|
language
|
time_collected
|
study_arm
|
clinical_score_a
|
clinical_score_b
|
clinical_score_c
|
clinical_score_d
|
clinical_score_e
|
file_path
|
workflow_tags
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
In this CSV:
-
project_name
: This is the name of the project or study for which you are uploading data. Please note that it is critical this value remains exactly the same for a given project. Otherwise, data will be categorized separately in Brooklyn Health’s backend and may not get processed. -
pt_id_external
: This is the unique identifier of the participant, specified by the user. -
age
(optional): This is the participant's age, specified by the user as a number. -
sex
(optional): This is the participant's sex, specified by the user as either M/F or 0/1. -
race
(optional): This is the participant's race, specified by the user as a string or a numerical. -
language
: This specifies the language contained in the file. Use the language codes provided by Brooklyn Health to specify the language your file contains. If the language column is left empty, US English will be the assumed language. -
time_collected
(optional): This is the timepoint associated with the file. It can be a date in YYYY-MM-DD or simply a label such as ‘screening’, ‘baseline’, or ‘visit2’. Though optional for a successful upload, it is necessary for datasets with repeated measures. -
study_arm
(optional): This is the arm the participant was assigned to in the study. It can be a string (e.g.,'placebo'
,'treatment'
) or a number (0, 1, 2, ...). -
clinical_score_a
(optional): This is the participant's clinical score (1), specified by the user, and can be represented using any string or number. -
clinical_score_b
(optional): This is the participant's clinical score (2), specified by the user, and can be represented using any string or number. -
clinical_score_c
(optional): This is the participant's clinical score (3), specified by the user, and can be represented using any string or number. -
clinical_score_d
(optional): This is the participant's clinical score (4), specified by the user, and can be represented using any string or number. -
clinical_score_e
(optional): This is the participant's clinical score (5), specified by the user, and can be represented using any string or number. -
file_path
: This is the local path for the file that you are trying to upload. Please enter the full file path, _including the file extension _(e.g., .wav, .mp4, etc.). -
workflow_tag
: This tag determines the processing workflow the file uploaded will undergo. The workflow tag will be provided to you by your contact at Brooklyn Health.
A sample CSV file is included in the repo. We recommend you use this as a starting point when creating your own data.csv
. It will ensure all column headings and subsequent entries are formatted correctly.
After triggering the function, it verifies the provided information before securely uploading data to Brooklyn Health’s servers. Upload speed relies on your internet connection. Where possible, an upload progress bar is displayed.
After uploading, the summary will show successful and failed uploads. If there are failures (some failures are expected in large uploads), you can re-run the command with the same CSV and the API will automatically ignore files that were successfully uploaded in the previous attempt.
If, for any reason, you want to force reupload files that have previously been uploaded, you can use the command in the manner shown below and the API will replace the previously uploaded files.
summary = willisapi.upload(key,'data.csv', force_upload=True)
After uploading, Brooklyn Health will begin to process the data. Your Brooklyn Health contact will notify you when processing is complete, allowing you to download the output measures.
To download processed measures, use the following function:
measures = willisapi.download(key, project_name = 'project_name')
project_name
was defined by you in the data.csv
.
On triggering this function, Brooklyn Health will compile all measures from the specified project into a dataframe. This dataframe, saved as "measures," will be formatted as shown below:
project_name
|
pt_id_external
|
filename
|
time_collected
|
[measureN]
|
[measureN+1]
|
...
|
...
|
...
|
...
|
...
|
...
|
The table begins with identifiers for the data file, specifically the project_name
, pt_id_external
, filename
, and time_collected
if originally specified by the user. The remainder of the columns are digital health measures calculated from the data file.
Each of the measure names are described in the OpenWillis documentation, specifically in each function’s dedicated page under List of Functions. You can also reach out to your technical contact at Brooklyn Health for help understanding the output variables.
We’re happy to say our diarization correction model reported on here can be accessed through WillisAPI.
To so do, use the following function:
transcript_corrected = willisapi.willis_diarize(key,'transcript.json')
Use the key
from when you logged in. transcript.json
is the path to a JSON file of a transcription acquired through AWS or WhisperX
Upon triggering this function, Brooklyn Health will process the input transcription using the WillisDiarize model and then return the JSON back to you with corrected speaker labels.
Note: We currently only support transcription JSONs from AWS and WhisperX.
––
Brooklyn Health is a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
We develop and maintain OpenWillis, an open source python library for digital health measurement.