Skip to content
vjbytes102 edited this page Sep 24, 2024 · 16 revisions

Welcome to the Github Wiki for the WillisAPI Client, the official python interface for interacting with Brooklyn Health’s WillisAPI. Here, you will find setup and user instructions.

Gaining access

Before you can get started with WillisAPI, you will need to have an account with Brooklyn Health. If you don’t have one just yet and are interested in using the API, be sure to [email protected].

Getting started

Before you begin, we recommend you create a virtual environment for the WillisAPI Client.

pip install virtualenv
virtualenv willisapi_client --python=3.9
source willisapi_client/bin/activate

Note: For stable behavior, it's recommended to use any Python version above 3.9.

To install the client, simply run:

pip install willisapi_client

Then, enter a python environment. From there, you’ll want to run:

import willisapi_client as willisapi

See below for instructions on login, upload, and download.

Login and access key

Once you are in a python environment, use the following function to log in:

key, expiration = willisapi.login('username', 'password')

Your username is your email. Enter the same password used during account setup.

The function will return:

  • key: This is the access key that will be required to use the API. It is valid for 24 hours.
  • expiration: The date/time when the key will expire, in the format YYYY-MM-DD-HH-MM-SS.

Digital phenotyping

Uploading data

To upload data, use the following function:

summary = willisapi.upload(key,'data.csv')

Enter the key that was saved during login.

data.csv is where information on the data to be uploaded will be given.

It is a CSV organized in the following format:

project_name pt_id_external age sex race language time_collected study_arm clinical_score_a clinical_score_b clinical_score_c clinical_score_d clinical_score_e file_path workflow_tags
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

In this CSV:

  • project_name: This is the name of the project or study for which you are uploading data. Please note that it is critical this value remains exactly the same for a given project. Otherwise, data will be categorized separately in Brooklyn Health’s backend and may not get processed.

  • pt_id_external: This is the unique identifier of the participant, specified by the user.

  • age (optional): This is the participant's age, specified by the user as a number.

  • sex (optional): This is the participant's sex, specified by the user as either M/F or 0/1.

  • race (optional): This is the participant's race, specified by the user as a string or a numerical.

  • language: This specifies the language contained in the file. Use the language codes provided by Brooklyn Health to specify the language your file contains. If the language column is left empty, US English will be the assumed language.

  • time_collected (optional): This is the timepoint associated with the file. It can be a date in YYYY-MM-DD or simply a label such as ‘screening’, ‘baseline’, or ‘visit2’. Though optional for a successful upload, it is necessary for datasets with repeated measures.

  • study_arm (optional): This is the arm the participant was assigned to in the study. It can be a string (e.g.,'placebo', 'treatment') or a number (0, 1, 2, ...).

  • clinical_score_a (optional): This is the participant's clinical score (1), specified by the user, and can be represented using any string or number.

  • clinical_score_b (optional): This is the participant's clinical score (2), specified by the user, and can be represented using any string or number.

  • clinical_score_c (optional): This is the participant's clinical score (3), specified by the user, and can be represented using any string or number.

  • clinical_score_d (optional): This is the participant's clinical score (4), specified by the user, and can be represented using any string or number.

  • clinical_score_e (optional): This is the participant's clinical score (5), specified by the user, and can be represented using any string or number.

  • file_path: This is the local path for the file that you are trying to upload. Please enter the full file path, _including the file extension _(e.g., .wav, .mp4, etc.).

  • workflow_tag: This tag determines the processing workflow the file uploaded will undergo. The workflow tag will be provided to you by your contact at Brooklyn Health.

A sample CSV file is included in the repo. We recommend you use this as a starting point when creating your own data.csv. It will ensure all column headings and subsequent entries are formatted correctly.

After triggering the function, it verifies the provided information before securely uploading data to Brooklyn Health’s servers. Upload speed relies on your internet connection. Where possible, an upload progress bar is displayed.

After uploading, the summary will show successful and failed uploads. If there are failures (some failures are expected in large uploads), you can re-run the command with the same CSV and the API will automatically ignore files that were successfully uploaded in the previous attempt.

If, for any reason, you want to force reupload files that have previously been uploaded, you can use the command in the manner shown below and the API will replace the previously uploaded files.

summary = willisapi.upload(key,'data.csv', force_upload=True)

Downloading measures

After uploading, Brooklyn Health will begin to process the data. Your Brooklyn Health contact will notify you when processing is complete, allowing you to download the output measures.

To download processed measures, use the following function:

measures = willisapi.download(key, project_name = 'project_name')

project_name was defined by you in the data.csv.

On triggering this function, Brooklyn Health will compile all measures from the specified project into a dataframe. This dataframe, saved as "measures," will be formatted as shown below:

project_name pt_id_external filename time_collected [measureN] [measureN+1]
... ... ... ... ... ...

The table begins with identifiers for the data file, specifically the project_name, pt_id_external, filename, and time_collected if originally specified by the user. The remainder of the columns are digital health measures calculated from the data file.

Each of the measure names are described in the OpenWillis documentation, specifically in each function’s dedicated page under List of Functions. You can also reach out to your technical contact at Brooklyn Health for help understanding the output variables.

WillisDiarize

We’re happy to say our diarization correction model reported on here can be accessed through WillisAPI.

To so do, use the following function:

transcript_corrected = willisapi.willis_diarize(key,'transcript.json')

Use the key from when you logged in. transcript.json is the path to a JSON file of a transcription acquired through AWS or WhisperX

Upon triggering this function, Brooklyn Health will process the input transcription using the WillisDiarize model and then return the JSON back to you with corrected speaker labels.

Note: We currently only support transcription JSONs from AWS and WhisperX.

––

Brooklyn Health is a small team of clinicians, scientists, and engineers based in Brooklyn, NY.

We develop and maintain OpenWillis, an open source python library for digital health measurement.