This is the source code for the paper titled "Exploiting Stereo Sound Channels to Boost Performance of Neural Network-Based Music Transcription". This paper has been accepted by the special session on deep learning at the 18th IEEE International Conference on Machine Learning and Applications (IEEE ICMLA-2019).
Instructions on Using This Code.
-
Install dependent libraries including tensorflow (1.13.1), librosa (0.6.2), and magenta (0.4.0). Note that there is a minor defect with the function apply_sustain_control_changes provided by magenta.music. We have fixed it and here for your convenience upload the script sequences_lib.py that contains this function.
-
Download the MAPS dataset (http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/) and unzip it. Then make an environment variable named maps pointing to the directory of this dataset.
-
The all-in-one script for training, validation and test is main.py. Open this script and search for a function named split_train_valid_and_test_files_fn. In this function, populate test_dirs with the actual directories of the close and the ambient setting generated by the Disklavier piano, and populate train_dirs with the actual directoreis of the other 7 settings generated by the synthesizer.
-
For your convenience, we have uploaded the model trained by ourselves to the folder saved_model. The name of the model is d0_epoch_9_of_15. You can do inference directly with this model.
-
In main.py search for self.train_or_inference to configure the script to run in inference mode or training mode. There is detailed instruction on how to configure.
-
You can view the model, performance measures and trained parameters with tensorboard.