Speech and Audio Processing

Jump to bottom

Carlos Lizarraga-Celaya edited this page Nov 7, 2024 · 1 revision

Learning Objective

Develop skills in applying deep learning techniques for speech recognition, synthesis, and audio-based applications.

Related Skills

1. Preprocessing and feature extraction from audio data
2. Implementing neural network architectures for speech tasks
3. Deploying speech-based models in production environments

Subtopics

1. Audio data preprocessing (resampling, normalization, windowing)
2. Feature extraction from audio signals (MFCC, spectrogram, wav2vec)
3. Speech recognition using recurrent neural networks and Transformers
4. Text-to-speech and speech synthesis using generative models
5. Audio event detection and classification

References and Resources

- "Deep Learning for Audio, Signal, and Image Processing" by Sudhakar Kumawat et al.
- "Automatic Speech Recognition: A Deep Learning Approach" by Dong Yu and Li Deng
- Coursera course "Audio Signal Processing for Music Applications" by Universitat Pompeu Fabra