Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 1.76 KB

README.md

File metadata and controls

17 lines (9 loc) · 1.76 KB

This is the source code for the paper titled "Exploiting Stereo Sound Channels to Boost Performance of Neural Network-Based Music Transcription". This paper has been accepted by the special session on deep learning at the 18th IEEE International Conference on Machine Learning and Applications (IEEE ICMLA-2019).

Instructions on Using This Code.

  1. Install dependent libraries including tensorflow (1.13.1), librosa (0.6.2), and magenta (0.4.0). Note that there is a minor defect with the function apply_sustain_control_changes provided by magenta.music. We have fixed it and here for your convenience upload the script sequences_lib.py that contains this function.

  2. Download the MAPS dataset (http://www.tsi.telecom-paristech.fr/aao/en/2010/07/08/maps-database-a-piano-database-for-multipitch-estimation-and-automatic-transcription-of-music/) and unzip it. Then make an environment variable named maps pointing to the directory of this dataset.

  3. The all-in-one script for training, validation and test is main.py. Open this script and search for a function named split_train_valid_and_test_files_fn. In this function, populate test_dirs with the actual directories of the close and the ambient setting generated by the Disklavier piano, and populate train_dirs with the actual directoreis of the other 7 settings generated by the synthesizer.

  4. For your convenience, we have uploaded the model trained by ourselves to the folder saved_model. The name of the model is d0_epoch_9_of_15. You can do inference directly with this model.

  5. In main.py search for self.train_or_inference to configure the script to run in inference mode or training mode. There is detailed instruction on how to configure.

  6. You can view the model, performance measures and trained parameters with tensorboard.