An introduction to our model for dimensional speech emotion recognition based on wav2vec 2.0. The model is available from doi:10.5281/zenodo.6221127 and released under CC BY-NC-SA 4.0. The model was created by fine-tuning the pre-trained wav2vec2-large-robust model on MSP-Podcast (v1.7). The pre-trained model was pruned from 24 to 12 transformer layers before fine-tuning. In this tutorial we use the ONNX export of the model. The original Torch model is hosted at Hugging Face. Further details are given in the associated paper.
The model can be used for non-commercial purposes, see CC BY-NC-SA 4.0. For commercial usage, a license for devAIce must be obtained. The source code in this GitHub repository is released under the following license.
Create / activate Python virtual environment and install audonnx.
$ pip install audonnx
Load model and test on random signal.
import audeer
import audonnx
import numpy as np
url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')
archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)
model = audonnx.load(model_root)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
model(signal, sampling_rate)
{'hidden_states': array([[-0.00711814, 0.00615957, -0.00820673, ..., 0.00666412,
0.00952989, 0.00269193]], dtype=float32),
'logits': array([[0.6717072 , 0.6421313 , 0.49881312]], dtype=float32)}
The hidden states might be used as embeddings for related speech emotion recognition tasks. The order in the logits output is: arousal, dominance, valence.
For a detailed introduction, please check out the notebook.
$ pip install -r requirements.txt
$ jupyter notebook notebook.ipynb
If you use our model in your own work, please cite the following paper:
@article{wagner2023dawn,
title={Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap},
author={Wagner, Johannes and Triantafyllopoulos, Andreas and Wierstorf, Hagen and Schmitt, Maximilian and Burkhardt, Felix and Eyben, Florian and Schuller, Bj{\"o}rn W},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
pages={1--13},
year={2023},
}