EMO-Disentanger

This is the official repository of ISMIR 2024 paper "Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation".

Paper | Demo page | Model weights | Dataset: EMOPIA+ | Dataset: Pop1K7 & Pop1K7-emo

Environment

Python 3.8 and CUDA 10.2 recommended
Install dependencies (required)

pip install -r requirements.txt

For stage2, install fast transformer or transformers (required)

# fast-transformers (the package used in the paper, but may not work in some cuda versions)
pip install --user pytorch-fast-transformers

# transformers
pip install transformers==4.28.0

Install midi2audio to synthesize generated MIDI to audio (optional)

pip install midi2audio
wget https://freepats.zenvoid.org/Piano/SalamanderGrandPiano/SalamanderGrandPiano-SF2-V3+20200602.tar.xz
tar -xzvf SalamanderGrandPiano-SF2-V3+20200602.tar.xz

Quick Start

Emotion-driven piano performance generation (with our trained models)

Method: Two-stage generation with functional representation

Download and unzip events and the best weights (make sure you're in repository root directory).
Stage1: Generate lead sheet with Positive or Negative emotion conditions (i.e., Valence Modeling).

python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=functional \
        --mode=lead_sheet \
        --inference_params=best_weight/Functional-two/emopia_lead_sheet_finetune/ep016_loss0.685_params.pt \
        --output_dir=generation/emopia_functional_two

Stage2: Generate music performance based on generated lead sheet in stage1 to convey 4Q emotions (i.e., Arousal Modeling).

(Option 1) with Performer backbone (install fast-transformers)

python3 stage2_accompaniment/inference.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=functional \
        --inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune/ep300_loss0.338_params.pt \
        --output_dir=generation/emopia_functional_two

(Option 2) with GPT-2 backbone (install transformers)

python3 stage2_accompaniment/inference.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=functional \
        --inference_params=best_weight/Functional-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.120_params.pt \
        --output_dir=generation/emopia_functional_two

To output synthesized audio together with midi files, add --play_midi in the commands.

Other methods

For two-stage generation with REMI:

# stage1
python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=remi \
        --mode=lead_sheet \
        --inference_params=best_weight/REMI-two/emopia_lead_sheet_finetune/ep016_loss0.846_params.pt \
        --output_dir=generation/emopia_remi_two

# stage2
# (Option 1) with Performer backbone (install fast-transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=remi \
        --inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune/ep300_loss0.350_params.pt \
        --output_dir=generation/emopia_remi_two

# (Option 2) with GPT-2 backbone (install transformers)
python3 stage2_accompaniment/inference.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=remi \
        --inference_params=best_weight/REMI-two/emopia_acccompaniment_finetune_gpt2/ep300_loss0.136_params.pt \
        --output_dir=generation/emopia_remi_two

For one-stage generation with REMI (baseline):

python3 stage1_compose/inference.py \
        --configuration=stage1_compose/config/emopia_finetune_full.yaml \
        --representation=remi \
        --mode=full_song \
        --inference_params=best_weight/REMI-one/emopia_finetune/ep100_loss0.620_params.pt \
        --output_dir=generation/emopia_remi_one

Train the model by yourself

Use Two-stage generation with functional representation as an example.

Use the provided events directly or convert MIDI to events following the steps.
Stage1: Valence Modeling (lead sheet generation)

# pre-train on HookTheory
python3 stage1_compose/train.py \
        --configuration=stage1_compose/config/hooktheory_pretrain.yaml \
        --representation=functional

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage1_compose/train.py \
        --configuration=stage1_compose/config/emopia_finetune.yaml \
        --representation=functional

Stage2: Arousal Modeling (performance generation)

(Option 1) with Performer backbone (install fast-transformers)

# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/pop1k7_pretrain.yaml \
        --representation=functional 

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune.yaml`)
python3 stage2_accompaniment/train.py \
        --model_type=performer \
        --configuration=stage2_accompaniment/config/emopia_finetune.yaml \
        --representation=functional

(Option 2) with GPT-2 backbone (install transformers)

# pre-train on Pop1k7
python3 stage2_accompaniment/train.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/pop1k7_pretrain_gpt2.yaml \
        --representation=functional 

# finetune on EMOPIA (remember to add pretrained params in `emopia_finetune_gpt2.yaml`)
python3 stage2_accompaniment/train.py \
        --model_type=gpt2 \
        --configuration=stage2_accompaniment/config/emopia_finetune_gpt2.yaml \
        --representation=functional

Dataset

We open source the processed midi data as follows:

EMOPIA+ for fine-tuning both stages, derived from emotion-annotated multi-modal dataset EMOPIA.
- We applied Midi_Toolkit for melody extraction and link for chord recognition to extract lead sheet from piano performance.
- To refine key signatures, we applied both MIDI-based (Midi toolbox) and audio-based (madmom) key detection methods and manually corrected the clips where the two methods disagreed.
Pop1K7-emo for pretraining second stage, derived from piano performance dataset AILabs.tw Pop1K7.
- Please refer to Compound Work Transformer for lead sheet extraction algorithms.
- Key signatures are detected using Midi toolbox.

Citation

If you find this project useful, please cite our paper:

@inproceedings{emodisentanger2024,
  author = {Jingyue Huang and Ke Chen and Yi-Hsuan Yang},
  title = {Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation},
  booktitle={Proceedings of the International Society for Music Information Retrieval Conference, {ISMIR}},
  year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
representations		representations
stage1_compose		stage1_compose
stage2_accompaniment		stage2_accompaniment
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMO-Disentanger

Environment

Quick Start

Emotion-driven piano performance generation (with our trained models)

Other methods

Train the model by yourself

Dataset

Citation

About

Releases

Packages

Contributors 2

Languages

License

Yuer867/EMO-Disentanger

Folders and files

Latest commit

History

Repository files navigation

EMO-Disentanger

Environment

Quick Start

Emotion-driven piano performance generation (with our trained models)

Other methods

Train the model by yourself

Dataset

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages