-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spoken digit dataset and baseline model #1090
Comments
The two datasets sound great :) and the example you mention would make for a nice tutorial, e.g. pytorch/tutorials#1204, thoughts? |
@vincentqb I expect a lot of overlap with a possible baseline for the speechcommand dataset. Is there a simple and lightweight ASR model that we can use for a tutorial? |
oops, I updated the link in my previous message to the new audio classification tutorial we have for audio. Is that what you meant? |
There is now a PyTorch loader for FSDD in https://github.com/eonu/torch-fsdd |
There was also a pull request to add AudioMNIST, but it was closed, #84 |
🚀 Feature
Given that there is a lack of small and comprehensive audio tasks, I would propose to add a speech MNIST dataset to torch audio.
Motivation
In the audio domain, we often lack small toy scenarios that would be a good equivalent to the ubiqous MNIST task.
A spoken digit dataset and model could help to sketch and try audio ML ideas.
Pitch
add either of the two:
to
torchaudio.datasets
Additional context
furthermore, it might be a good idea to also add a baseline model, either based on MELSpectrogram -> conv2d or using the existing
wav2letter
.The text was updated successfully, but these errors were encountered: