Repo for Network Bending Neural Vocoders demo. Supported by HDI Network, MIMIC project and Google AMI.
Network Bending Neural Vocoders @ NeurIPS 2020, Machine Learning for Creativity and Design Workshop
Studio report: sound synthesis with DDSP andnetwork bending techniques @ 2nd Conference on AI Music Creativity (MuMe + CSMC) 2021
We provide a Python implementation with GUI interface, which should run in real time on normal (CPU) laptops (tested on 2019 Macbook Pro). As with other DDSP examples, this will pulll the pitch and amplitude from a given audio file and use that to power the model.
-
Clone Repo
-
Install
a. ddsp, tensorflow, gin, numpy, pandas, scipy, librosa, sounddevice, mido, python-rtmidi
-
Run
python gui.py
. This will load the default flute model, and 1 minute of White Houston audio as an input file. You can provide your own models, input audio file and midi port (if using midi controller) as arguments here.
For example
python gui.py -i my_audio_file.wav -m ddsp_sax_model -p "Akai MPD controller 1"
usage: gui.py [-h] [-i INPUT_AUDIO] [-p MIDI_PORT] [-m MODEL]
optional arguments:
-h, --help show this help message and exit
-i INPUT_AUDIO, --input_audio INPUT_AUDIO
name of file in audio_data directory
-p MIDI_PORT, --midi_port MIDI_PORT
name of midi port to connect to
-m MODEL, --model MODEL
name of folder containing model checkpoint in Models
folder
When you are ready to begin, or want to update with new settings, click Update
You have 5 slots to chain the layer transformations. Transformations are applied to activations following each layer.
The first fully connected layer
The recurrent layer
The second fully connected layers
Add an oscillation to the activations in the time dimension. This has two parameters (freq
and depth
).
Set activations to 0. This has no parameters.
1 - activations. This has no parameters.
Set all values below the threshold to the minimum value of the activations matrix, and all values above the threshold to maximum value of the activations matrix. There is 1 parameter (thresh
).
Set from 0 to 1 determining the proportion of units that the transform is applied to. Can be updated with the text input or can be controlled by midi
(mapped 0-1), provide the midi cc
channel in the midi
input. Audio is generated in 4 second blocks so any changes will apply to the subsequent block of audio.
Some transforms (oscillate
and threshold
) have parameters that can be controlled by midi, or by set by an LFO. Either provide the midi cc
channel in the midi
input or an LFO frequency in the lfo_freq
input. The min
and max
sets the range / mapping for either the LFO or midi controller.