Skip to content

Latest commit

 

History

History
74 lines (47 loc) · 3.46 KB

README.md

File metadata and controls

74 lines (47 loc) · 3.46 KB

BC learning for sounds

Implementation of Learning from Between-class Examples for Deep Sound Recognition by Yuji Tokozume, Yoshitaka Ushiku, and Tatsuya Harada (ICLR 2018).

This also contains training of EnvNet: Learning Environmental Sounds with End-to-end Convolutional Neural Network (Yuji Tokozume and Tatsuya Harada, ICASSP 2017).1

News

  • (2018/02/16) Add support to the latest ESC datasets
  • (2018/01/29) Our paper was accepted by ICLR 2018

Contents

  • Between-class (BC) learning
    • We generate between-class examples by mixing two training examples belonging to different classes with a random ratio.
    • We then input the mixed data to the model and train the model to output the mixing ratio.
  • Training of EnvNet and EnvNet-v2 on ESC-50, ESC-10 [1], and UrbanSound8K [2] datasets
    • EnvNet-v2: a deeper version of EnvNet. The performance of it on ESC-50 surpasses the human level when using BC learning.

Setup

  • Install Chainer v1.24 on a machine with CUDA GPU.
  • Prepare datasets following this page.

Training

  • Template:

      python main.py --dataset [esc50, esc10, or urbansound8k] --netType [envnet or envnetv2] --data path/to/dataset/directory/ (--BC) (--strongAugment)
    
  • Recipes:

    • Standard learning of EnvNet on ESC-50 (around 29% error2):

        python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/
      
    • BC learning of EnvNet on ESC-50 (around 24% error):

        python main.py --dataset esc50 --netType envnet --data path/to/dataset/directory/ --BC
      
    • BC learning of EnvNet-v2 on ESC-50 with strong data augmentation (around 15% error, the best performance):

        python main.py --dataset esc50 --netType envnetv2 --data path/to/dataset/directory/ --BC --strongAugment
      
  • Notes:

    • Validation accuracy is calculated using 10-crop testing.
    • By default, it performs K-fold cross validation using the original fold settings. You can run on a particular split by using --split command.
    • Please check opts.py for other command line arguments.

Results

Error rate (Standard learning → BC learning)

Model ESC-50 ESC-10 UrbanSound8K
EnvNet 29.2 → 24.1 12.8 → 11.3 33.7 → 28.9
EnvNet-v2 25.6 → 18.2 14.2 → 10.6 30.9 → 23.4
EnvNet-v2 +
strong augment
21.2 → 15.1 10.9 → 8.6 24.9 → 21.7
Humans [1] 18.7 4.3 -

See also

Between-class Learning for Image Clasification (github)


1 Training/testing schemes are simplified from those in the ICASSP paper.

2 It is higher than that reported in the ICASSP paper (36% error), mainly because here we use 4 out of 5 folds for training, whereas we used only 3 folds in the ICASSP paper.

Reference

[1] Karol J Piczak. Esc: Dataset for environmental sound classification. In ACM Multimedia, 2015.

[2] Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. A dataset and taxonomy for urban sound research. In ACM Multimedia, 2014.