A Computational Creativity project - music video generation from melody and user interaction
-
MTG Jamendo Mood Classifier: Trained on an in-house dataset, this model by the Music Technology Group (MTG) at the Universitat Pompeu Fabra (UPF) predicts 56 types of moods and genres on audio files.
-
MusicGen: Trained on 20,000 hours of licensed music, this model by Meta’s FAIR team generates music based on a text prompt, conditioned on an input audio.
-
AnimateDiff: Trained on WebVid-10M, a dataset of stock videos, this model by ByteDance generates videos based on a text prompt, using epiCRealism as its text-to-image base model.
-
Install dependencies with poetry
poetry install
-
Download essentia models and save to
tf_graph_files/
folder
wget https://essentia.upf.edu/models/music-style-classification/discogs-effnet/discogs-effnet-bs64-1.pb -P tf_graph_files
wget https://essentia.upf.edu/models/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs-effnet-1.pb -P tf_graph_files
- Run gradio app
cd src && poetry run app.py
hum_me_a_melody_gradio_final.ipynb
- Google Colab compatible notebook