Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Superseded by #1243
This is the initial step to add Kaldi's pitch feature.
Approach
Kaldi's pitch feature is somewhat convoluted and it involves resampling and Viterbi algorithm, which is not easy to reproduce correctly in another language (i.e. Python). Therefore I took an approach to reuse Kaldi's original implementation as much as possible.
In this approach, I re-implemented the minimum interface of the family of Kaldi's matrix library, (such as
kaldi::VectorBase
,kaldi::MatrixBase
, etc...) withtorch::Tensor
class, and pulled the source files required to compilekaldi/src/feat/pitch-functions.cc
with minimum modifications. (Commenting out some#include
statements and correcting the type definitions. This is done in my fork of kaldi.)The reason I used
torch::Tensor
class is that, this way, we do not need to be worried about the availability of a BLAS library, and the existing build process should just work. (with the exception of the addition ofgit submodule
initialization) Once we find out a reliable way to detect BLAS library that PyTorch is linked against, then we should be able to switch to use bare Vector/Matrix classes.The resulting
torchaudio.functional.compute_kaldi_pitch
function produces numerically identical result as thecompute-kaldi-pitch-feats
binary, but it is very slow. (x60) One major reason seems to be element-wise access, which Kaldi uses a lot but is not efficient in PyTorch.There are three major things to consider forward;
1. Speed
The possible way to improve the speed is;
ComputeKaldiPitch
function to apply operation in vectorized manner.2. Interface
Batch
Currently
torchaudio.functional.compute_kaldi_pitch
is a simple wrapper aroundkaldi::ComputeKaldiPitch
and can only process one waveform at a time. We need to extend the function so that it can handle batched samples. A naive approach will be to parallelize the operation over batch dimension.Return values
Currently the returned Tensor is 2D with
[time, pitch then NCCF]
. Providing an easy way to get pitch or NCCF without manually doing index slicing is preferred. One possible way is named Tensor.3. Implementation detail
There are miscellaneous implementation details to be polished. Examples are;
KALDI_ASSERT
hangs, so they have to be replaced withTORCH_CHECK
orTORCH_INTERNAL_ASSERT