← Back to overview

Speech (ASR)

For audio the data requires different preprocessing, the course cs224s can be good starting point.

Books

Non-DL based

Fundamentals of Speech Recognition [$] by L. Rabiner and B-H Juang 1st Edition
Statistical Methods for Speech Recognition - Language, Speech, and Communication [$] by F. Jelinek Fourth Printing Edition.

DL based

Automatic Speech Recognition: A Deep Learning Approach - Signals and Communication Technology [$] by D. Yu and L. Deng 2015th Edition

Toolkits and Tools

Non-DL-based

DL-based

Mozilla - DeepSpeech

Scientific Papers

DNN as a replacement of Gaussian Mixture Model

Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, G. Hinton et al., 2012
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, G. Dahl et al., 2012
Acoustic modeling using deep belief networks, A. Mohamed et al., 2012

End-to-end models

Deep speech 2: End-to-end speech recognition in English and Mandarin, D. Amodei et al., 2015
End-to-end attention-based large vocabulary speech recognition, D. Bahdanau et al., 2016
Speech recognition with deep recurrent neural networks, A. Graves, 2013