Skip to content

Latest commit

 

History

History
248 lines (196 loc) · 11.5 KB

index.md

File metadata and controls

248 lines (196 loc) · 11.5 KB

Analysis and detection of singing techniques
in repertoires of J-POP solo singers

Yuya Yamamoto, Juhan Nam, Hiroko Terasawa

Paper on ArXiv Conference page

Abstract of the paper

In this paper, we focus on singing techniques within the scope of music information retrieval research. We investigate how singers use singing techniques using real-world recordings of famous solo singers in Japanese popular music songs (J-POP). First, we built a new dataset of singing techniques. The dataset consists of 168 commercial J-POP songs, and each song is annotated using various singing techniques with timestamps and vocal pitch contours. We also present descriptive statistics of singing techniques on the dataset to clarify what and how often singing techniques appear. We further explored the difficulty of the automatic detection of singing techniques using previously proposed machine learning techniques. In the detection, we also investigate the effectiveness of auxiliary information (i.e., pitch and distribution of label duration), not only providing the baseline. The best result achieves 40.4% at macro-average F-measure on nine-way multi-class detection. We provide the annotation of the dataset and its detail on the appendix website (this site). https://yamathcy.github.io/ISMIR2022J-POP/

Dataset "COSIAN"

Description

We built a new dataset named COSIAN (a COllection of SInging voice ANnotation) to conduct the analysis. COSIAN is an annotation collection of Japanese popular (J-POP) songs, focusing on singing style and expression of famous solo-singers.

It consists of various 168 songs. There are 21 female- and 21 male singers. Each singer has four songs that have different moods from each other.

What is the motivation?

Understanding the singing voice more

The basic concept of the work is analyzing the singers' characteristics by clarification of how they render the song. One of the naive ways to realize it is annotating the presence of singing techniques, which are produced by fluctuating the pitch, timbre, etc. However, there are no such datasets, so we decided to build it.

Metadata

It contains songlist. it contains following information;

<iframe width="1000" height="500" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vRAkgcnUAJkbBLqnpvs2qk9uAdqkVyjygsI7wvrBC4zrpKhc_lVTIR0xTm5Yk6I-aFt1O5DQqxVITj1/pubhtml?gid=1530300283&single=true&widget=true&headers=false"></iframe>

Annotations

  • Singing techniques: Overlapping strong labeled annotation (i.e. kinds and timestamps) of singing techniques.

(CAUTION) Audio files are not contained below!!

-> If you want the annotation files, access here and request a permission. The annotation is research purpose only.

The request should include the following. Otherwise it will be rejected.

  • Name
  • Affiliation
  • Email Address
  • Agree to the License

  • Pitch (not publicly available): Since pitch is an essential component of singing technique analysis, we further annotated melodic pitch using Tony, followed by manual correction such as removing the unvoiced parts and reverberation tails.

Because of copyright issue, we don't provide raw audio tracks. Instead, we provide links of music streaming service for each songs in COSIAN.

  • Spotify links: Because of copyright issue, we don't provide raw audio tracks. Instead, we provide Spotify links of each songs in COSIAN.

  • YouTube links: We also provides YouTube links on the YouTube playlist. Note that the playlist only contains official mv without alignment information.

  • Amazon music links (work in progress): We will also provides Amazon music links for you to purchase CD recordings, which we actually used in the task.

  • Apple music links : In addition to Amazon music, we also provide Apple music links. When purchasing each music track via Apple music, please purchase them from the "apple_music" column in the spreadsheet.

Annotation procedure

We used Sonic visualiser, to annotate the singing techniques with both of the help of sound playback and visualizing the spectrograms and pitchgrams.

Annotated singing techniques

Overview

Examples of each singing technique

Data statistics

  • Annotated duration
  • Song released year
  • Count and duration of singing techniques
  • Distribution of duration of each singing technique
  • Singer-wise count of singing techniques

Detected examples

These are the examples automatically detected by Focal-GT model. Note that videos are sample of audio clip, we actually used audio from the CD recordings for the task.

Good examples

#1: Sakura / Ikimono gakari
Video clip 1:30-1:36 Label (Upper: ground truth label, lower: detected labels)
<iframe width="400" height="300" src="https://www.youtube.com/embed/61z-cqg28R8?start=90" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
#2: Omoiga karanaru sono mae ni / Ken Hirai
Video clip 2:38-2:45 Label (Upper: ground truth label, lower: detected labels)
<iframe width="400" height="300" src="https://www.youtube.com/embed/n2zqrJMuJvM?start=158" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Bad examples

We confirmed that one of the common mis-detection cases is from the detection of too short or frequently switching regions.

#1: Readymade / Ado
Video clip 0:50-0:55 Label (Upper: ground truth label, lower: detected labels)
<iframe width="400" height="300" src="https://www.youtube.com/embed/jg09lNupc1s?start=50" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
#2: Honey / L'Arc~en~Ciel
Video clip 0:14-0:20 Label (Upper: ground truth label, lower: detected labels)
<iframe width="400" height="300" src="https://www.youtube.com/embed/WmM-KTcG3QY?start=14" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Contact

If you have any questions about the paper, please contact the first author Yuya. We also accept issues in github repository.

License

The COSIAN contains copyright material. We share COSIAN with researchers under the following conditions:

  • COSIAN may only be used by the individual signing below and by members of the research group or organisation of this individual. This permission is not transferable.
  • COSIAN may be used only for non-commercial research purposes.
  • COSIAN (or data enabling the its reproduction) may not be sold, leased, published or distributed to any third party without written permission from the COSIAN administrator.

University of Tsukuba and KAIST shall not be held liable for any errors in the content of COSIAN nor damage arising from the use of COSIAN. The COSIAN administrator may update these conditions of use at any time.

Citation

Cite the ISMIR 2022 paper.

@inproceedings{yamamoto2022analysis,
         author = {Yamamoto, Yuya and Nam, Juhan and Terasawa, Hiroko},
         title = {Analysis and Detection of Singing Techniques in Repertoires of J-POP solo singers},
         booktitle = {Proceedings of the 23rd International Society for Music Information Retrieval Conference (ISMIR)},
         year = {2022}
}