aTENNuate (paper)
aTENNuate is a network that can be configured for real-time speech enhancement on raw audio waveforms. It can perform tasks such as audio denoising, super-resolution, and de-quantization. This repo contains the network definition and a set of pre-trained weights for the aTENNuate model.
Note that the repo is meant for denoising performance evaluation on custom audio samples, and is not optimized for inference. It also does not contain the recurrent configuration of the network, so it cannot be directly used for real-time inference by itself. Evaluation should ideally be done on a batch of .wav files at once as expected by the denoise.py
script.
Please contact Brainchip Inc. to learn more on the full real-time audio denoising solution. And please consider citation our work if you find this repo useful.
One simply needs a working Python environment, and run the following
pip install attenuate
To run the pre-trained network on custom audio samples, simply put the .wav
files (or other format supported by librosa
) into the noisy_samples
directory (or any directory of your choice), and run the following
import torch
from attenuate import Denoiser
model = Denoiser()
model.eval()
with torch.no_grad():
model.from_pretrained("PeaBrane/aTENNuate")
model.denoise('noisy_samples', denoised_dir='test_samples')
# denoised_samples = model.denoise('noisy_samples') # return torch tensors instead
The denoised samples will then be saved as .wav
files in the denoised_samples
directory.
The network should be easily interfaced with your custom training pipeline. The network expects an input of shape (batch, 1, length)
and an output of the same shape, which can be sampled at any frequency (though the pre-trained weights operate at 16000 Hz). Note that length should be a multiple of 256, due to the downsampling behavior of the network.
The model supports torch.compile
for training, but the FFT operations will be performed in eager mode still, due to complex numbers not being supported. The model does not fully support torch.amp
yet stably, due to the sensitivity of the SSM layers. It is recommended to train the model with tensorfloat32
instead, which can be enabled by
from torch.backends import cudnn
torch.backends.cuda.matmul.allow_tf32 = True
cudnn.allow_tf32 = True
Noisy Sample | Denoised Sample |
---|---|
Noisy Sample 1 | Denoised Sample 1 |
Noisy Sample 2 | Denoised Sample 2 |
Noisy Sample 3 | Denoised Sample 3 |
Noisy Sample | Denoised Sample |
---|---|
Noisy Sample 1 | Denoised Sample 1 |
Noisy Sample 2 | Denoised Sample 2 |
Noisy Sample 3 | Denoised Sample 3 |
Please submit a Github issue if you find any bugs. If you'd like to contribute a new feature, feel free to open a Github issue to discuss, or email [email protected].