Performed midi dataset

The dataset contains 1104 performances of 205 opus of classical piano, polyphonic music.

Author	Performances	Opera
Bach	172	59
Balakirev	10	1
Beethoven	279	64
Brahms	1	1
Chopin	296	36
Debussy	3	2
Glinka	2	1
Haydn	45	12
Liszt	122	16
Mozart	16	3
Prokofiev	8	1
Rachmaninoff	9	5
Ravel	36	5
Schubert	64	15
Schumann	28	10
Scriabin	13	2

Objectives

Build a dataset of classical piano, polyphonic MIDI performances, where each performance is associated to a:

XML score files
MIDI score: a quantized MIDI file, similar to the XML score files, but in MIDI format and with the repetitions unravelled
downbeats/beats position in the MIDI score
downbeats/beats position in the MIDI performed
alignments between each event in the MIDI performance and the corresponding event in the XML score
alignments between each event in the MIDI performance and the corresponding event in the MIDI score

Usage:

ground truth for MIDI piano automatic music transcription (AMT) tasks
- MIDI beat/downbeat tracking
- MIDI quantization
- voice separations
- score structuring
ground truth for piano expressive performance models

Source Dataset

The dataset is obtained by curating the following dataset by Jeong et al. (manual cleaning of the beat mark annotations)

this dataset is used for the evalution of VirtuosoNet https://github.com/jdasam/virtuosoNet
the dataset is not public
solo piano music, classical music, polyphonic, symbolic (MIDI + annote)

refs:

Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Juhan Nam Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3060-3070, 2019. http://proceedings.mlr.press/v97/jeong19a.html

VirtuosoNet: A Hierarchical Attention RNN for Generating Expressive Piano Performance from Music Score Dasaem Jeong, Taegyun Kwon, Juhan Nam NIPS 2018 https://github.com/jdasam/virtuosoNet

VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance
Dasaem Jeong; Taegyun Kwon; Yoojin Kim; Kyogu Lee; Juhan Nam ISMIR 2019 http://archives.ismir.net/ismir2019/paper/000112.pdf

The source dataset contains a set of performances from Yamaha competition (multiple performances of the same opus are present) named with the original name [performance].MID (e.g. Zhdanov08.MID). The following files are associated to each performance:

XML score from musescore in musicXML format: musicxml_cleaned.musicxml
MIDI score (quantified MIDI) produced with Musescore from the musicXML file, duplicating the eventual repetitions in the score: midi_cleaned.mid
automatic alignement (txt file) between the performed midi and the score midi: [performance]_infer_match.txt
automatic alignment (txt file) between the performed midi and the xml score: [performance]_infer_corresp.txt

The automatic alignment is produced with the following algoritm:

Performance Error Detection and Post-Processing for Fast and Accurate Symbolic Music Alignment
Eita Nakamura, Kazuyoshi Yoshii, Haruhiro Katayose ISMIR 2017 https://eita-nakamura.github.io/articles/EN_etal_ErrorDetectionAndRealignment_ISMIR2017.pdf

Curation procedure

available at:
https://github.com/fosfrancesco/performed-midi-dataset

In theory the automatic extraction of beats in the performance works in 2 steps:

we extract the beats from the MIDI score using the library PrettyMidi (better to start from the MIDI score than from the XML score because the repetitions are already unravelled),
from every beat in the MIDI score we compute a beat in the MIDI performance, using the automatic midi2midi alignment file.

In practise this automatic extraction presents the following problems:

the beat in MIDI files (step 1) are not always computed correctly:
- extra beats at the beginning
- downbeats marked as beats and viceversa
- extra downbeats in repetitions
- misaligned beats (e.g. Ravel Pavane)
the computation of the beat in the MIDI performance (step 2) may not be possible if:
- there are no events happening at the beat position
- the automatic alignment midi2midi is not correct because of player mistakes by
- the automatic alignment midi2midi is not correct because the algotihm does not handle some situations (e.g.trillo )

So the idea is to use the automatic extraction of beats and downbeats as a base and manually correct it.

The new workflow is:

extract the beats/downbeats from the MIDI score using the library PrettyMidi (file ann_quant.txt)
manually correct the beats/downbeats of the MIDI score (file ann_quant_cleaned.txt)
- Choose a Folder from the dataset (from authors above)
- Open Audacity https://www.audacityteam.org/
- File -> Import ->MIDI file: midi_cleaned.mid
- File -> Import ->Labels: ann_quant.txt
  - Labels are annotated as b beat or db downbeat
  - Labels that were automatically detected to have some problems are annotated with W after the name.
- File -> Import ->Audio: quant_click.wav
- Manually correct the wrong labels
- Export labels: File -> Export -> Labels : ann_quant_cleaned.txt
generate the beats/downbeats for the MIDI performances, using the corrected MIDI score annotations and the automatic midi2midi alignment (file ann_unquant.txt)
manually correct the beats/downbeats of the MIDI performance (file ann_unquant_cleaned.txt)

see table at:

https://docs.google.com/spreadsheets/d/1Ep2KYIxaz0Uwedi0f-FO4QLtoAq_Lv05Yybx8kayeBM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Performed midi dataset

Objectives

Usage:

Source Dataset

Curation procedure

Files

README.md

Latest commit

History

README.md

File metadata and controls

Performed midi dataset

Objectives

Usage:

Source Dataset

Curation procedure