BC learning for sounds

Implementation of Learning from Between-class Examples for Deep Sound Recognition by Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada (ICLR 2018).

This also contains training of EnvNet: Learning Environmental Sounds with End-to-end Convolutional Neural Network (Yuji Tokozume and Tatsuya Harada, ICASSP 2017).¹

News

Between-class (BC) learning
- We generate between-class examples by mixing two training examples belonging to different classes with a random ratio.
- We then input the mixed data to the model and train the model to output the mixing ratio.
Training of EnvNet and EnvNet-v2 on ESC-50, ESC-10 [1], and UrbanSound8K [2] datasets
- EnvNet-v2: a deeper version of EnvNet. The performance of it on ESC-50 surpasses the human level when using BC learning.

Template:

  python main.py --dataset [esc50, esc10, or urbansound8k] --netType [envnet or envnetv2] --data path/to/dataset/directory/ (--BC) (--strongAugment)

Recipes:

Standard learning of EnvNet on ESC-50 (around 29% error²):

  python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/

BC learning of EnvNet on ESC-50 (around 24% error):

  python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/ --BC

BC learning of EnvNet-v2 on ESC-50 with strong data augmentation (around 15% error, the best performance):

  python main.py --dataset esc50 --netType envnetv2 --data path/to/dataset/directory/ --BC --strongAugment

Notes:
- Validation accuracy is calculated using 10-crop testing.
- By default, it performs K-fold cross validation using the original fold settings. You can run on a particular split by using --split command.
- Please check opts.py for other command line arguments.

Error rate (Standard learning → BC learning)

Model	ESC-50	ESC-10	UrbanSound8K
EnvNet	29.2 → 24.1	12.8 → 11.3	33.7 → 28.9
EnvNet-v2	25.6 → 18.2	14.2 → 10.6	30.9 → 23.4
EnvNet-v2 + strong augment	21.2 → 15.1	10.9 → 8.6	24.9 → 21.7
Humans [1]	18.7	4.3	-