Vocal acoustics v1.0

Date completed	April 13, 2023
Release where first appeared	OpenWillis v1.0
Researcher / Developer	Vijay Yadav

import openwillis as ow

framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'audio.wav')

Calculating a list of vocal acoustic features from inputted audio (only .wav files supported)

First, a set of vocal acoustic properties that have framewise values are calculated through Parselmouth and saved in framewise. This includes the following variables:
- Fundamental frequency (f0), measured in Hertz
- Formant frequencies 1 through 4 (f1, f2, f3, and f4), measured in Hertz
- Loudness, measured in decibels
- Harmonics-to-noise ratio (hnr)
Pydub is used to detect the presence of voice in the audio file. This information is compiled into the pauses output, which lists each pause, when it started, when it ended, and its duration.
In the summary output, the mean, standard deviation, minimum, maximum, and range of each of the variables from the first step are saved.
The information stored in pauses is compiled into three variables, also saved in summary:
Number of pauses per minute (pause_rate)
- Mean duration of pauses (pause_meandur), measured in seconds
- Silence ratio (silence_ratio), the percentage of frames with no voice detected
Parselmouth is used to calculate an additional set of variables that pertain to the entirety of the audio file rather than be framewise measures. These are saved directly in the summary output:
- Jitter (absolute)
- Jitter (rap)
- Jitter (ppq5)
- Jitter (ddp)
- Shimmer (absolute)
- Shimmer (db)
- Shimmer (apq3)
- Shimmer (apq5)
- Shimmer (apq11)
- Shimmer (dda)
- Glottal-to-noise excitation ratio

Type	str
Description	path to audio file; can only support .wav files

Type	data-type
Description	framewise output of acoustic properties that can be calculated for individual frames. columns represent variables, rows represent frames

What the data frame looks like:

Type	data-type
Description	list of all pauses detected in the audio file, with start times, end times, and durations. precursor for pause variables in summary output. all values are in seconds.

What the data frame looks like:

Type	data-type
Description	final output of all vocal acoustic measures calculated from the input audio file.

The data frame is the transpose of the table below:

Here, we use this function to process a sample audio file included in the repository.

import openwillis as ow

framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'data/trim.wav')

framewise.head(2)

Below are dependencies specific to calculation of this measure.

Dependency	License	Justification
Parselmouth	GPL 3.0 License	Python implementation of the Praat software library, a long trusted source for measurement methods in vocal acoustics
Pdyub	MIT License	Open source and accurate methods for analysis of audio files; using it to parse speech versus silence in audio files

OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.

Provide feedback