Description

Official code for the paper "HILCodec: High Fidelity and Lightweight Neural Audio Codec".
[paper] [samples]

Environment

We tested under CUDA=11.7, torch=1.13 and CUDA=10.2, torch=1.12.
It may work in other environments, but not guaranteed.

Install using anaconda

Intall for training

First, install PyTorch along with torchaudio.
Then, install other requirements as below.

conda install librosa -c conda-forge
conda install jupyter notebook matplotlib scipy tensorboard tqdm pyyaml
pip install pesq pystoi

Finally, install ONNXRuntime for CPU .
Optionally, install ViSQOL.

Install for test

For test, you only need to install ONNXRuntime, librosa, and soundfile.

Datasets

Download VCTK, DNS-Challenge4 and Jamendo dataset for training. For validation, we used p225, p226, p227, and p228 from VCTK for clean speech. Real noisy speech recordings from DNS-Challenge4 are used for noisy speech. Jamendo/99 are used for music.
Downsample all audio files into 24khz before training (see scripts/Resampling.ipynb).

Training

Use configs/...yaml file to change configurations.
Modify directories_to_include, directories_to_exclude, wav_dir.
Also, modify filelists/infer_24khz.txt or filelists/infer_speech.txt file, which cotain audio files used for inference in tensorboard.
Either use train.py or train_torchrun.py for training. Examples are:

CUDA_VISIBLE_DEVICES=0,1 python train.py -c configs/hilcodec_music.yaml -n first_exp -p train.batch_size=16 train.seed=1234 -f

CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nproc_per_node=2 train_torchrun.py -c configs/hilcodec_music.yaml -n first_exp -p train.batch_size=16 train.seed=1234 -f

Arguments:
-n: (Required) Directory name to save checkpoints, the configuration file, and tensorboard logs.
-c: (Optional) Configuration file path. If not given, use a configuration file in the directory.
-p: (Optional) Parameters after this will update configurations.
-f: (Optional) If the directory already exists, an exception will be raised to avoid overwriting config file. However, enabling this option will force overwriting config file.

Inference

ONNX

Pre-trained model parameters are provided in the onnx directory. Two versions are available:

hil_music
hil_speech

hil_music is a model trained on general audio dataset (clean speech, noisy speech, music). hil_speech is a model trained only on clean speech dataset.

Modify the variable PATH in test_onnx.py as you want, and run the following code:

python test_onnx.py -n hil_speech --enc --dec

The output will be saved at onnx/hil_speech_output.wav.
Use python test_onnx.py --help for information about each argument.
Note that for AudioDec, you must set -H 300.

You can convert your own trained HILCodec to ONNXRuntime using scripts/HILCodec Onnx.ipynb.
You can also convert Encodec and AudioDec to ONNXRuntime for comparison.
Download checkpoints from official repositories and use scripts/Encodec Onnx.ipynb or scripts/AudioDec Onnx.ipynb.

PyTorch

You can also download pytorch checkpoints and tensorboard logs from google drive.
Download the .zip files and use scripts/inference.ipynb.

Evaluating PESQ, STOI and ViSQOL

Our training code includes objective metrics calculation. You can set pesq in a config file appropriately.
Note that on our server it occasionally crashes (especially when calculating ViSQOL), so the default config is to turn off calculation.
To calculate metrics after training, you can use scripts/pesq.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
filelists		filelists
functional		functional
models		models
modules		modules
onnx		onnx
optim		optim
samples		samples
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clean_checkpoint.py		clean_checkpoint.py
index.html		index.html
requirements.txt		requirements.txt
test_onnx.py		test_onnx.py
train.py		train.py
train_torchrun.py		train_torchrun.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Environment

Install using anaconda

Intall for training

Install for test

Datasets

Training

Inference

ONNX

PyTorch

Evaluating PESQ, STOI and ViSQOL

About

Languages

License

aask1357/hilcodec

Folders and files

Latest commit

History

Repository files navigation

Description

Environment

Install using anaconda

Intall for training

Install for test

Datasets

Training

Inference

ONNX

PyTorch

Evaluating PESQ, STOI and ViSQOL

About

Resources

License

Stars

Watchers

Forks

Languages