Note: Uncomment the MFCC extraction block to work with your own sounds. Otherwise, I have also provided a sample dataset i.e. dataset.npy.
- Cepstral Coefficients of dimension (178,44,13) where 178 are number of audio, 44 is the number of samples for each audio and 13 are number of Coefficients.
- Labels of dimension (178,)
Generates 44x13 2D image for each sound signal and a Target column (Label)
- For each type of sound, create a directory or folder in the audio/ directory.
- To see what I mean by that, explore the audio folder in this repository. I have placed an audio as an example.
- After you are done making directories for sounds,
place this script in the directory as I have placed it in this repository.
- This implementation rejects the audio signals having lower sample rate than 22050.
- Number of MFCCs selected are 13.
- Hop length across the signal is 512.
- Number of fast fourier transformation is 2048.