Add ComputeKaldiPitch prototype #1063

mthrok · 2020-12-02T17:18:46Z

Superseded by #1243

This is the initial step to add Kaldi's pitch feature.

Approach

Kaldi's pitch feature is somewhat convoluted and it involves resampling and Viterbi algorithm, which is not easy to reproduce correctly in another language (i.e. Python). Therefore I took an approach to reuse Kaldi's original implementation as much as possible.

In this approach, I re-implemented the minimum interface of the family of Kaldi's matrix library, (such as kaldi::VectorBase, kaldi::MatrixBase, etc...) with torch::Tensor class, and pulled the source files required to compile kaldi/src/feat/pitch-functions.cc with minimum modifications. (Commenting out some #include statements and correcting the type definitions. This is done in my fork of kaldi.)

The reason I used torch::Tensor class is that, this way, we do not need to be worried about the availability of a BLAS library, and the existing build process should just work. (with the exception of the addition of git submodule initialization) Once we find out a reliable way to detect BLAS library that PyTorch is linked against, then we should be able to switch to use bare Vector/Matrix classes.

The resulting torchaudio.functional.compute_kaldi_pitch function produces numerically identical result as the compute-kaldi-pitch-feats binary, but it is very slow. (x60) One major reason seems to be element-wise access, which Kaldi uses a lot but is not efficient in PyTorch.

There are three major things to consider forward;

1. Speed

The possible way to improve the speed is;

Modify the higher level implementation like ComputeKaldiPitch function to apply operation in vectorized manner.
Use the stripped version of the original implementation of Kaldi's Vector/Matrix classes and compile them with the same BLAS library that PyTorch uses. In Conda environment this is kind of easy, but we need to figure out the correct way to do so cross-platform (Linux/Windows/macOS, conda/pip)

2. Interface

Batch

Currently torchaudio.functional.compute_kaldi_pitch is a simple wrapper around kaldi::ComputeKaldiPitch and can only process one waveform at a time. We need to extend the function so that it can handle batched samples. A naive approach will be to parallelize the operation over batch dimension.

Return values

Currently the returned Tensor is 2D with [time, pitch then NCCF]. Providing an easy way to get pitch or NCCF without manually doing index slicing is preferred. One possible way is named Tensor.

3. Implementation detail

There are miscellaneous implementation details to be polished. Examples are;

I found that bare KALDI_ASSERT hangs, so they have to be replaced with TORCH_CHECK or TORCH_INTERNAL_ASSERT
Better way to maintain modified Kaldi. We should maintain a fork of Kaldi in pytorch org.

vincentqb · 2020-12-11T18:13:15Z

To consider: would git subtree allow for a more appropriate way of handling the history and modification of kaldi code than git submodule ? e.g. here

mthrok · 2020-12-11T18:13:22Z

Proposed improved approach;

find out the way to submoduling only needed (unmodified files) so that we do not need to keep original source code that we do not need to maintain the copy of unmodified files.
Only keep the modified source code.

facebook-github-bot added the CLA Signed label Dec 2, 2020

mthrok force-pushed the pitch-feature branch 2 times, most recently from 6b09bd8 to dbe8039 Compare December 2, 2020 18:47

mthrok force-pushed the pitch-feature branch 2 times, most recently from ab00886 to e9439e8 Compare December 10, 2020 21:28

mthrok force-pushed the pitch-feature branch from e9439e8 to 71a5584 Compare December 15, 2020 16:42

This was referenced Dec 17, 2020

add warp transducer as submodule vincentqb/audio#5

Closed

add warp transducer #1099

Closed

vincentqb mentioned this pull request Jan 8, 2021

DRAFT #1163

Closed

mthrok force-pushed the pitch-feature branch 2 times, most recently from 044bfc2 to 061dabe Compare January 15, 2021 23:36

mthrok added this to the v0.8 milestone Jan 25, 2021

vincentqb mentioned this pull request Jan 25, 2021

Roadmap ahead for torchaudio #1196

Closed

mthrok force-pushed the pitch-feature branch 2 times, most recently from a004912 to 1a72ebc Compare January 26, 2021 21:32

Add ComputeKaldiPitch

09c54fb

mthrok force-pushed the pitch-feature branch from 1a72ebc to 09c54fb Compare January 27, 2021 20:33

mthrok closed this Feb 8, 2021

mthrok deleted the pitch-feature branch February 8, 2021 19:25

mthrok mentioned this pull request Feb 8, 2021

Add Kaldi Pitch feature #1243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ComputeKaldiPitch prototype #1063

Add ComputeKaldiPitch prototype #1063

mthrok commented Dec 2, 2020 •

edited

Loading

vincentqb commented Dec 11, 2020

mthrok commented Dec 11, 2020

Add ComputeKaldiPitch prototype #1063

Add ComputeKaldiPitch prototype #1063

Conversation

mthrok commented Dec 2, 2020 • edited Loading

Approach

1. Speed

2. Interface

Batch

Return values

3. Implementation detail

vincentqb commented Dec 11, 2020

mthrok commented Dec 11, 2020

mthrok commented Dec 2, 2020 •

edited

Loading