Skip to content

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation"

License

Notifications You must be signed in to change notification settings

Yuer867/EMO-Disentanger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMO-Disentanger

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation".

Paper | Demo page | Model weights | Dataset: EMOPIA+ | Dataset: Pop1K7 & Pop1K7-emo

Environment

  • Python 3.8 and CUDA 10.2 recommended
  • Install dependencies (required)
pip install -r requirements.txt
# fast-transformers (the package used in the paper, but may not work in some cuda versions)
pip install --user pytorch-fast-transformers

# transformers
pip install transformers==4.28.0
  • Install midi2audio to synthesize generated MIDI to audio (optional)
pip install midi2audio
wget https://freepats.zenvoid.org/Piano/SalamanderGrandPiano/SalamanderGrandPiano-SF2-V3+20200602.tar.xz
tar -xzvf SalamanderGrandPiano-SF2-V3+20200602.tar.xz

Quick Start

Emotion-driven piano performance generation (with our trained models)

Method: Two-stage generation with functional representation

  1. Download and unzip events and the best weights (make sure you're in repository root directory).
  2. Stage1: Generate lead sheet with Positive or Negative emotion conditions (i.e., Valence Modeling).
python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=functional \
        --mode=lead_sheet \
        --inference_params=best_weight/Functional-two/emopia_lead_sheet_finetune/ep016_loss0.685_params.pt \
        --output_dir=generation/emopia_functional_two
  1. Stage2: Generate music performance based on generated lead sheet in stage1 to convey 4Q emotions (i.e., Arousal Modeling).
  • (Option 1) with Performer backbone (install fast-transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=functional \
        --inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune/ep300_loss0.338_params.pt \
        --output_dir=generation/emopia_functional_two
  • (Option 2) with GPT-2 backbone (install transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=functional \
        --inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.120_params.pt \
        --output_dir=generation/emopia_functional_two
  1. To output synthesized audio together with midi files, add --play_midi in the commands.

Other methods

  1. For two-stage generation with REMI:
# stage1
python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=remi \
        --mode=lead_sheet \
        --inference_params=best_weight/REMI-two/emopia_lead_sheet_finetune/ep016_loss0.846_params.pt \
        --output_dir=generation/emopia_remi_two

# stage2
# (Option 1) with Performer backbone (install fast-transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=remi \
        --inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune/ep300_loss0.350_params.pt \
        --output_dir=generation/emopia_remi_two

# (Option 2) with GPT-2 backbone (install transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=remi \
        --inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.136_params.pt \
        --output_dir=generation/emopia_remi_two
  1. For one-stage generation with REMI (baseline):
python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune_full.yaml \
        --representation=remi \
        --mode=full_song \
        --inference_params=best_weight/REMI-one/emopia_finetune/ep100_loss0.620_params.pt \
        --output_dir=generation/emopia_remi_one

Train the model by yourself

Use Two-stage generation with functional representation as an example.

  1. Use the provided events directly or convert MIDI to events following the steps.
  2. Stage1: Valence Modeling (lead sheet generation)
# pre-train on HookTheory
python3 stage1_compose/train.py \
        --configuration=stage1_compose/config/hooktheory_pretrain.yaml \
        --representation=functional

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage1_compose/train.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=functional
  1. Stage2: Arousal Modeling (performance generation)
  • (Option 1) with Performer backbone (install fast-transformers)
# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/pop1k7_pretrain.yaml \
        --representation=functional 

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage2_accompaniment/train.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=functional
  • (Option 2) with GPT-2 backbone (install transformers)
# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/pop1k7_pretrain_gpt2.yaml \
        --representation=functional 

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune_gpt2.yaml`)
python3 stage2_accompaniment/train.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=functional

Dataset

We open source the processed midi data as follows:

  • EMOPIA+ for fine-tuning both stages, derived from emotion-annotated multi-modal dataset EMOPIA.
    • We applied Midi_Toolkit for melody extraction and link for chord recognition to extract lead sheet from piano performance.
    • To refine key signatures, we applied both MIDI-based (Midi toolbox) and audio-based (madmom) key detection methods and manually corrected the clips where the two methods disagreed.
  • Pop1K7-emo for pretraining second stage, derived from piano performance dataset AILabs.tw Pop1K7.

Citation

If you find this project useful, please cite our paper:

@inproceedings{emodisentanger2024,
  author = {Jingyue Huang and Ke Chen and Yi-Hsuan Yang},
  title = {Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation},
  booktitle={Proceedings of the International Society for Music Information Retrieval Conference, {ISMIR}},
  year = {2024}
}

About

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages