-
Notifications
You must be signed in to change notification settings - Fork 8
Vocal acoustics v1.0
anzar edited this page Jun 14, 2023
·
1 revision
Date completed | April 13, 2023 |
Release where first appeared | OpenWillis v1.0 |
Researcher / Developer | Vijay Yadav |
import openwillis as ow
framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'audio.wav')
Calculating a list of vocal acoustic features from inputted audio (only .wav files supported)
- First, a set of vocal acoustic properties that have framewise values are calculated through Parselmouth and saved in framewise. This includes the following variables:
- Fundamental frequency (f0), measured in Hertz
- Formant frequencies 1 through 4 (f1, f2, f3, and f4), measured in Hertz
- Loudness, measured in decibels
- Harmonics-to-noise ratio (hnr)
- Pydub is used to detect the presence of voice in the audio file. This information is compiled into the pauses output, which lists each pause, when it started, when it ended, and its duration.
- In the summary output, the mean, standard deviation, minimum, maximum, and range of each of the variables from the first step are saved.
- The information stored in pauses is compiled into three variables, also saved in summary:
Number of pauses per minute (pause_rate)- Mean duration of pauses (pause_meandur), measured in seconds
- Silence ratio (silence_ratio), the percentage of frames with no voice detected
- Parselmouth is used to calculate an additional set of variables that pertain to the entirety of the audio file rather than be framewise measures. These are saved directly in the summary output:
- Jitter (absolute)
- Jitter (rap)
- Jitter (ppq5)
- Jitter (ddp)
- Shimmer (absolute)
- Shimmer (db)
- Shimmer (apq3)
- Shimmer (apq5)
- Shimmer (apq11)
- Shimmer (dda)
- Glottal-to-noise excitation ratio
Type | str |
Description | path to audio file; can only support .wav files |
Type | data-type |
Description | framewise output of acoustic properties that can be calculated for individual frames. columns represent variables, rows represent frames |
What the data frame looks like:
frame | f0 | f1 | f2 | f3 | f4 | loudness | hnr |
0 | |||||||
1 | |||||||
... |
Type | data-type |
Description | list of all pauses detected in the audio file, with start times, end times, and durations. precursor for pause variables in summary output. all values are in seconds. |
What the data frame looks like:
pause_start | pause_end | pause_duration |
... |
Type | data-type |
Description | final output of all vocal acoustic measures calculated from the input audio file. |
The data frame is the transpose of the table below:
f0_mean | |
f0_stdev | |
f0_min | |
f0_max | |
f0_range | |
f1_mean | |
f1_stdev | |
f1_min | |
f1_max | |
f1_range | |
... | |
loudness_mean | |
loudness_stdev | |
loudness_min | |
loudness_max | |
loudness_range | |
hnr | |
jitter | |
jitter_abs | |
jitter_rap | |
jitter_ppq5 | |
jitter_ddp | |
shimmer | |
shimmer_db | |
shimmer_apq3 | |
shimmer_apq5 | |
shimmer_apq11 | |
shimmer_dda | |
gne_ratio | |
pause_meandur | |
pause_rate | |
silence_ratio |
Here, we use this function to process a sample audio file included in the repository.
import openwillis as ow
framewise, pauses, summary = ow.vocal_acoustics(audio_path = 'data/trim.wav')
framewise.head(2)
frame | f0 | loudness | hnr | form1freq | form2freq | form3freq | form4freq |
0 | 107.72 | 49.71 | 7.88 | 439.77 | 1720.29 | 2662.75 | 4328.91 |
1 | 105.88 | 48.59 | 9.10 | 376.80 | 2513.84 | 2667.70 | 4105.55 |
Below are dependencies specific to calculation of this measure.
Dependency | License | Justification |
Parselmouth | GPL 3.0 License | Python implementation of the Praat software library, a long trusted source for measurement methods in vocal acoustics |
Pdyub | MIT License | Open source and accurate methods for analysis of audio files; using it to parse speech versus silence in audio files |
OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
- Release notes
- Getting started
-
List of functions
- Video Preprocessing for Faces v1.0
- Video Cropping v1.0
- Facial Expressivity v2.0
- Emotional Expressivity v2.0
- Eye Blink Rate v1.0
- Speech Transcription with Vosk v1.0
- Speech Transcription with Whisper v1.0
- Speech Transcription with AWS v1.0
- WillisDiarize v1.0
- WillisDiarize with AWS v1.0
- Speaker Separation with Labels v1.1
- Speaker Separation without Labels v1.1
- Audio Preprocessing v1.0
- Speech Characteristics v3.2
- Vocal Acoustics v2.1
- Phonation Acoustics v1.0
- GPS Analysis v1.0
- Research guidelines