Skip to content

Multi-modal system that generates short music videos from hummed melodies.

Notifications You must be signed in to change notification settings

wenqinglim/hum_me_a_melody

Repository files navigation

Hum Me a Melody

A Computational Creativity project - music video generation from melody and user interaction

Process Diagram

Models used

  • MTG Jamendo Mood Classifier: Trained on an in-house dataset, this model by the Music Technology Group (MTG) at the Universitat Pompeu Fabra (UPF) predicts 56 types of moods and genres on audio files.

  • MusicGen: Trained on 20,000 hours of licensed music, this model by Meta’s FAIR team generates music based on a text prompt, conditioned on an input audio.

  • AnimateDiff: Trained on WebVid-10M, a dataset of stock videos, this model by ByteDance generates videos based on a text prompt, using epiCRealism as its text-to-image base model.

Setup steps

  1. Install dependencies with poetry poetry install

  2. Download essentia models and save to tf_graph_files/ folder

wget https://essentia.upf.edu/models/music-style-classification/discogs-effnet/discogs-effnet-bs64-1.pb -P tf_graph_files
wget https://essentia.upf.edu/models/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs-effnet-1.pb -P tf_graph_files
  1. Run gradio app cd src && poetry run app.py

Notebooks

hum_me_a_melody_gradio_final.ipynb - Google Colab compatible notebook

About

Multi-modal system that generates short music videos from hummed melodies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published