Speech Transcription with Vosk v1.0

Date completed	October 18, 2023
Release where first appeared	OpenWillis v1.6
Researcher / Developer	Vijay Yadav, Anzar Abbas

1 – Syntax

import openwillis as ow

transcript_json, transcript_text = ow.speech_transcription_vosk(filepath = '', language = '', transcribe_interval = '')

2 – Methods

This function transcribes speech into text using the Vosk Speech Recognition Toolkit, an open source model that allows for offline speech transcription without needing significant computational resources. It is best suited for researchers transcribing speech on their personal machines.

A limitation of this function is that it assumes the source audio contains speech from a single speaker only. The transcription output does not contain speaker labels and will not be able to distinguish speech from multiple speakers. For this functionality, see the Speech Transcription with Whisper or Speech Transcription with AWS functions in OpenWillis.

Vosk supports several languages/dialects, namely English, Indian English, German, French, Spanish, Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Catalan, Arabic, Greek, Farsi, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, and Polish.

By default, the function assumes the input language is English, the language code for which is en. See below for all other language codes.

Optionally, the user may specify a section of the source audio they want to transcribe by entering transcribe_interval = [a,b], where a is the start time and b is the end time, both in seconds.

From the transcription, two outputs are provided: The first is the transcript_json output, which contains start and end time labels for each word as well as a confidence score on the transcription itself. The second output is the transcript_text, which is simply a string of the entire transcription, not including any punctuation.

3 – Inputs

3.1 – `filepath`

Type	String
Description	A local path to the file that the user wants to process. Most audio and video file types are supported.

3.2 – `language`

Type	String; optional, default is English, i.e., `en`
Description	Language of the source audio file

English	`en`
Indian English	`en-in`
Chinese	`cn`
Russian	`ru`
French	`fr`
German	`de`
Spanish	`es`
Portuguese or Brazilian Portuguese	`pt`
Greek	`gr`
Turkish	`tr`
Vietnamese	`vn`
Italian	`it`
Dutch	`nl`
Catalan	`ca`
Arabic	`ar`
Farsi	`fa`
Filipino	`ph`
Ukrainian	`uk`
Kazakh	`kz`
Swedish	`sv`
Japanese	`ja`
Esperanto	`eo`
Hindo	`hi`
Czech	`cz`
Polish	`pl`

3.3 – `transcribe_interval`

Type	List; optional, default is `[0]`
Description	List specifying the start and end time (in seconds) of the audio file that the user wants to transcribe. For example, the user can input `[0, 10]` if they want to transcribe the first 10 seconds, `[11, 20]` if they want to transcribe a chunk in the middle of the video, or `[20]` if they want to transcribe everything after 20 seconds.

4 – Outputs

4.1 – `transcript_json`

Type	JSON
Description	This is a word-by-word transcript saved as a JSON

4.2 – `transcript_text`

Type	String
Description	The transcription compiled into a string

5 – Dependencies

Below are dependencies specific to calculation of this measure.

Dependency	License	Justification
vosk-api	Apache 2.0	Open source model for offline speech transcription that is the backbone of this function

OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Transcription with Vosk v1.0

1 – Syntax

2 – Methods

3 – Inputs

3.1 – `filepath`

3.2 – `language`

3.3 – `transcribe_interval`

4 – Outputs

4.1 – `transcript_json`

4.2 – `transcript_text`

5 – Dependencies

Table of contents

Clone this wiki locally

Speech Transcription with Vosk v1.0

1 – Syntax

2 – Methods

3 – Inputs

3.1 – filepath

3.2 – language

3.3 – transcribe_interval

4 – Outputs

4.1 – transcript_json

4.2 – transcript_text

5 – Dependencies

Table of contents

Clone this wiki locally

3.1 – `filepath`

3.2 – `language`

3.3 – `transcribe_interval`

4.1 – `transcript_json`

4.2 – `transcript_text`