TVSM Dataset

The TV Speech and Music (TVSM) dataset contains speech and music activity labels across a variety of TV shows and their corresponding audio features extracted from professionally-produced high-quality audio. The dataset aims to facilitate research on speech and music detection tasks.

Get the dataset

The dataset can be downloaded via Zenodo.org.
The paper can be downloaded via EURASIP open access.
This repo contains materials and codebase to reproduce the baseline experiment in the paper.

License and attribution

@ARTICLE{Hung2022,
  title={A Large TV Dataset for Speech and Music Activity Detection},
  author={Hung, Yun-Ning and Wu, Chih-Wei and Orife, Iroro and Hipple, Aaron and Wolcott, William and Lerch, Alexander},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
  volume={2022},
  number={1},
  pages={21},
  year={2022},
  publisher={Springer}
}

The TVSM dataset is licensed under a Apache License 2.0 license

Dataset introduction

The downloaded dataset has the following structure:

└─── READEME.txt
└─── TVSM-cuesheet/
│    └─── labels/
│    └─── mel_features/
│    └─── mfcc/
│    └─── vgg_features/
│    └─── TVSM-xxxx_metadata.csv
└─── TVSM-pseudo/
└─── TVSM-test/

READEME.txt: basic information about the dataset
TVSM-cuesheet/: smaller subset used for training. The labels are derived from cuesheet information
TVSM-pseudo/: larger subset used for training. The labels are labeled from a pre-trained model trained on TVSM-cuesheet
TVSM-test/: subset for testing. The labels are labeled by human annotators

Each subset folder has the same structure:

labels/: speech and music activation labels for each sample. Each row in a csv file represents "start time", "end time" and "s(speech)/m(music)"
mel_features/: the Mel spectrogram feature extracted from the audio of each sample
mfcc/: the MFCCs feature extracted from the audio of each sample
vgg_features/: the VGGish feature extracted from the audio of each sample
TVSM-xxxx_metadata.csv: the metadata of each sample

For more information, please visit our paper

Codebase introduction

Inference Code

Thanks @owlwang for the contribution! The easy-to-use inference code is now included in inference/

cd inference
python3 inference.py --audio_path test.wav --output_dir output/ --format csv/csv_prob

Old inference code

Interested in inferencing existing samples? Please visit predictor.py for usage.

cd training_code
python3 predictor.py --audio_path test.wav

Please install git lfs first then run git-lfs pull to restore the checkpoints

Please replace line 31 in SM_detector.py with self.save_hyperparameters(hparams) if you are using newer pytorch_lightning versions.

└─── Evaluation_Output/
│    └─── AVASpeech/
│    │    └─── T2
│    │    └─── TVSM-cuesheet
│    │    └─── TVSM-pseudo
│    └─── ...
└─── Models/
└─── training_code/

Evaluation_Output: the output generated by three models across five evaluation sets
- T2: baseline method
- TVSM-cuesheet: CRNN-P-Cue method
- TVSM-pseudo: CRNN-P-Pseu method
Models: the pre-trained checkpoint from CRNN-P-Cue and CRNN-P-Pseu methods
training_code: code for training the model

Bug Fix

If you encounter error "batch response: This repository is over its data quota. Account responsible for LFS...", can download the model checkpoint from Google Drive

Contact

Please feel free to contact [email protected] or open an issue here if you have any questions about the dataset or the support code.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Evaluation_Output		Evaluation_Output
Models		Models
inference		inference
training_code		training_code
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TVSM Dataset

Get the dataset

License and attribution

Dataset introduction

Codebase introduction

Inference Code

Old inference code

Bug Fix

Contact

About

Releases

Packages

Contributors 4

Languages

License

biboamy/TVSM-dataset

Folders and files

Latest commit

History

Repository files navigation

TVSM Dataset

Get the dataset

License and attribution

Dataset introduction

Codebase introduction

Inference Code

Old inference code

Bug Fix

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages