As part of this master thesis, the following investigations need to be carried out.
- Get familiar with the state-of-the-art models for the two audio process- ing applications.
- Utilize existing optimization techniques with the audio processing ap- plications.
- Select the methods that are deployable on the available embedded hardware.
- Develop a hardware-aware multi-objective optimization flow.
- Target a specific objective for innovation or competition with the state- of-the-art.
First you have to setup the required environment (Download dcase20 dataset):
python -m venv envs/torch
source envs/torch/bin/activate
pip install -r requirements.txt
./dcase.sh
export CUBLAS_WORKSPACE_CONFIG=:16:8
After that everything should be setup to run tests:
pytest src/*
Now you can run the baseline for several audio processing application:
python src/main.py
Set --config-name=asc3 or sc for different applications. Take a look inside YAML files to see possible configurations.
You can run lottery ticket experiments or training with iterative pruning:
python src/main.py apply=lottery
python src/main.py apply=itr-prune
Or you can just prune a pretrained model:
python src/main.py apply=prune model_dir=path/to/model/state
Setup the required environment for NEMO:
python -m venv envs/nemo
source envs/nemo/bin/activate
pip install -r requirements-nemo.txt
You can post-training quantization and fine-tuning with NEMO or Nvidia:
python src/main.py model.quant=nemo model_dir=path/to/model/state optim=sgd-steplr optim.optim.lr=0.001 n_epochs=5
python src/main.py model.quant=nvidia model_dir=path/to/model/state optim=sgd-steplr optim.optim.lr=0.001 n_epochs=5
Or you can run quantization-aware-training with Brevitas(Currently only for SC):
python src/main.py --config-name=sc model.quant=brevitas
python src/main.py --config-name=sc tmodel=vgg19 tmodel_dir=/path/to/teacher/state model_dir=/path/to/student/state optim.optim.lr=0.001
You can test a pre-trained network:
python src/main.py apply=test model_dir=path/to/model/state
- ASC model: QTI SUBMISSION TO DCASE 2021: RESIDUAL NORMALIZATION FOR DEVICE-IMBALANCED ACOUSTIC SCENE CLASSIFICATION WITH EFFICIENT DESIGN
- SC model: Very Deep Convolutional Networks for Large-Scale Image Recognition
- DCASE2022
- Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
- NEMO (NEural Minimizer for pytOrch)
- Brevitas