Human Voice Dataset

A collection of human voice records based on various way of singing (note pitch, voyel, consonant, etc).

This dataset is built to ease research on voice-based musical controllers. It can help to benchmark voice feature detection algorithms (pitch detection, onset detection) as well as form a training corpus for machine learning algorithms.

Current version provides 1 singer records but the dataset will grow in future weeks.

Voice features

Voice features are enumarated through various dimensions : notes form one dimension explored from lowest to highest possible range with semi tones intervals. E.g. :

c3.wav
c#3.wav
d3.wav
...

Voyels (i.e. formant), also form a dimension with finite values (a, e, i, ...). Voyels are available in separated files with names like

_-a-[note].wav
_-e-[note].wav
_-i-[note].wav
_-o-[note].wav
_-ou-[note].wav
_-u-[note].wav

Some features (e.g. consonants) require to be pronounced with a voyel to be understandable so we iterate over : 't' is not pronounced alone, it is used in 'ta', 'tu', 'to', etc.

t-a-[note].wav
t-u-[note].wav
t-o-[note].wav
d-a-[note].wav
d-u-[note].wav
d-o-[note].wav
...

Said shortly each singer, the dataset provides (gray indicate not-available-yet samples):

notes : generally in note range C1-C3 resulting of 24 notes (over 1 voyels {a})
voyels : a, e, i, o, ou, u (over 2 note : {c3, f3})
consonants (occlusives) : _ (none), b, c, d, f, g, l,m, n, p, q, r, s, t, v, w, y, z (for 1 voyels : {a}, over 2 note : {c3, f3})
dynamics : volume change, pitch bend, vibrato

Dataset structure

Note name pattern

[consonant]-[voyel]-[note]-[dynamic].wav

consonant : _, t, d, b, l, ...
voyel     : a, e, i, ...
note      : c3, c#3, d3, ...
dynamic   : _, vibrator, bend, ...

Browse the dataset online or see the overview below :

data/voices/

martin/
- notes/
  - sources/
    - notes.wav
    - notes-markers.txt
    - recording.properties
    - singer.properties
  - exports/
    - mono-44100/
    - mono-22050/
      - c3-_-a.wav
      - c#3.wav
      - ...
- voyels/
  - sources/
  - exports/
    - mono-44100/
      - _-a-c3.wav
      - _-a-c4.wav
      - _-a-c5.wav
      - _-e-c3.wav
      - _-i-c3.wav
      - _-o-c3.wav
- consonants/
  - sources/
  - exports/
    - mono-44100/
      - b-a-c3.wav
      - b-e-c3.wav
      - b-a-g2.wav
      - b-e-g2.wav

[singer]/[serie]/sources/singer.properties

age : 34
gender : male
nationality : french

[singer]/[serie]/sources/recorder.properties

recorder : Roland R05
information : recording device at 20cm of the mouth
date : 2014

Benchmarks

Vocobox applications allow to evaluate pitch detection in various ways : bulk evaluation on note datasets, live evaluation with microphone, etc.

See this benchmark to learn more on Human Voice Dataset pitch evaluation with TarsosDSP.

Adding voices samples to the dataset

Devices

For the first voice record, we simply used :

a piano with a metronome and headphone to indicate note duration and height to the singer.
a Roland R05 recording device standing at 20cm of the mouth of the singer.

Recording

Each note is sung 3 to 10 times during 1 sec at tempo 60. A note serie is recorded in one file, saved in

[singer]/[serie]/sources/[name].wav

Informations about recording conditions are added in

[singer]/[serie]/source/record.properties
[singer]/[serie]/source/singer.properties

Slicing notes

We use Audacity to precisely set voice event start and stop, and export sound slices for each note of the record.

Markers can be saved in text files (they can be reused in modified versions of the record : mono, lower quality, etc).

Markers stand next to the original record :

[singer]/[serie]/source/[name]-markers.txt

Splitted notes exported to :

[singer]/[serie]/exports/[version]/[name].wav

Enhancing the dataset

To add samples to this dataset, simply follow these steps : Clone this repository from your terminal

git clone https://github.com/vocobox/human-voice-dataset.git

copy your [singer] folder next to the other singers, and back to your terminal type

git add .
git add -u
git commit -m "[new singer] barbara"
git push origin master

You might wish to learn how to make pull-requests

Other usefull sound datasets

Piano notes dataset :

MAPS Database

Singing Voice dataset :

http://www.isophonics.net/SingingVoiceDataset

Speech databases :

Voices and instruments :

http://mtg.upf.edu/download/datasets/irmas/

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data/voices/martin		data/voices/martin
doc/images		doc/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Voice Dataset

Voice features

Dataset structure

Benchmarks

Adding voices samples to the dataset

Devices

Recording

Slicing notes

Enhancing the dataset

Other usefull sound datasets

About

Releases

Packages

Contributors 2

License

vocobox/human-voice-dataset

Folders and files

Latest commit

History

Repository files navigation

Human Voice Dataset

Voice features

Dataset structure

Benchmarks

Adding voices samples to the dataset

Devices

Recording

Slicing notes

Enhancing the dataset

Other usefull sound datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages