This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation".
Paper | Demo page | Model weights | Dataset: EMOPIA+ | Dataset: Pop1K7 & Pop1K7-emo
- Python 3.8 and CUDA 10.2 recommended
- Install dependencies (required)
pip install -r requirements.txt
- For stage2, install fast transformer or transformers (required)
# fast-transformers (the package used in the paper, but may not work in some cuda versions)
pip install --user pytorch-fast-transformers
# transformers
pip install transformers==4.28.0
- Install midi2audio to synthesize generated MIDI to audio (optional)
pip install midi2audio
wget https://freepats.zenvoid.org/Piano/SalamanderGrandPiano/SalamanderGrandPiano-SF2-V3+20200602.tar.xz
tar -xzvf SalamanderGrandPiano-SF2-V3+20200602.tar.xz
Method: Two-stage generation with functional representation
- Download and unzip events and the best weights (make sure you're in repository root directory).
- Stage1: Generate lead sheet with Positive or Negative emotion conditions (i.e., Valence Modeling).
python3 stage1_compose/inference.py \
--configuration=stage1_compose/config/emopia_finetune.yaml \
--representation=functional \
--mode=lead_sheet \
--inference_params=best_weight/Functional-two/emopia_lead_sheet_finetune/ep016_loss0.685_params.pt \
--output_dir=generation/emopia_functional_two
- Stage2: Generate music performance based on generated lead sheet in stage1 to convey 4Q emotions (i.e., Arousal Modeling).
- (Option 1) with Performer backbone (install fast-transformers)
python3 stage2_accompaniment/inference.py \
--model_type=performer \
--configuration=stage2_accompaniment/config/emopia_finetune.yaml \
--representation=functional \
--inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune/ep300_loss0.338_params.pt \
--output_dir=generation/emopia_functional_two
- (Option 2) with GPT-2 backbone (install transformers)
python3 stage2_accompaniment/inference.py \
--model_type=gpt2 \
--configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
--representation=functional \
--inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.120_params.pt \
--output_dir=generation/emopia_functional_two
- To output synthesized audio together with midi files, add
--play_midi
in the commands.
- For two-stage generation with REMI:
# stage1
python3 stage1_compose/inference.py \
--configuration=stage1_compose/config/emopia_finetune.yaml \
--representation=remi \
--mode=lead_sheet \
--inference_params=best_weight/REMI-two/emopia_lead_sheet_finetune/ep016_loss0.846_params.pt \
--output_dir=generation/emopia_remi_two
# stage2
# (Option 1) with Performer backbone (install fast-transformers)
python3 stage2_accompaniment/inference.py \
--model_type=performer \
--configuration=stage2_accompaniment/config/emopia_finetune.yaml \
--representation=remi \
--inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune/ep300_loss0.350_params.pt \
--output_dir=generation/emopia_remi_two
# (Option 2) with GPT-2 backbone (install transformers)
python3 stage2_accompaniment/inference.py \
--model_type=gpt2 \
--configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
--representation=remi \
--inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.136_params.pt \
--output_dir=generation/emopia_remi_two
- For one-stage generation with REMI (baseline):
python3 stage1_compose/inference.py \
--configuration=stage1_compose/config/emopia_finetune_full.yaml \
--representation=remi \
--mode=full_song \
--inference_params=best_weight/REMI-one/emopia_finetune/ep100_loss0.620_params.pt \
--output_dir=generation/emopia_remi_one
Use Two-stage generation with functional representation as an example.
- Use the provided events directly or convert MIDI to events following the steps.
- Stage1: Valence Modeling (lead sheet generation)
# pre-train on HookTheory
python3 stage1_compose/train.py \
--configuration=stage1_compose/config/hooktheory_pretrain.yaml \
--representation=functional
# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage1_compose/train.py \
--configuration=stage1_compose/config/emopia_finetune.yaml \
--representation=functional
- Stage2: Arousal Modeling (performance generation)
- (Option 1) with Performer backbone (install fast-transformers)
# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
--model_type=performer \
--configuration=stage2_accompaniment/config/pop1k7_pretrain.yaml \
--representation=functional
# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage2_accompaniment/train.py \
--model_type=performer \
--configuration=stage2_accompaniment/config/emopia_finetune.yaml \
--representation=functional
- (Option 2) with GPT-2 backbone (install transformers)
# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
--model_type=gpt2 \
--configuration=stage2_accompaniment/config/pop1k7_pretrain_gpt2.yaml \
--representation=functional
# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune_gpt2.yaml`)
python3 stage2_accompaniment/train.py \
--model_type=gpt2 \
--configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
--representation=functional
We open source the processed midi data as follows:
- EMOPIA+ for fine-tuning both stages, derived from emotion-annotated multi-modal dataset EMOPIA.
- We applied Midi_Toolkit for melody extraction and link for chord recognition to extract lead sheet from piano performance.
- To refine key signatures, we applied both MIDI-based (Midi toolbox) and audio-based (madmom) key detection methods and manually corrected the clips where the two methods disagreed.
- Pop1K7-emo for pretraining second stage, derived from piano performance dataset AILabs.tw Pop1K7.
- Please refer to Compound Work Transformer for lead sheet extraction algorithms.
- Key signatures are detected using Midi toolbox.
If you find this project useful, please cite our paper:
@inproceedings{emodisentanger2024,
author = {Jingyue Huang and Ke Chen and Yi-Hsuan Yang},
title = {Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation},
booktitle={Proceedings of the International Society for Music Information Retrieval Conference, {ISMIR}},
year = {2024}
}