In this tutorial speech to text recognition is presented.
In this tutorial, we use the quartznet 15x5 model. QuartzNet performs automatic speech recognition. Its design is based on the Jasper architecture, which is a convolutional model trained with Connectionist Temporal Classification (CTC) loss. The model is available from Open Model Zoo.
If you have not done so already, please follow the Installation Guide to install all required dependencies.