Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: The future of Kaldi compliance module #1269

Open
mthrok opened this issue Feb 15, 2021 · 24 comments
Open

RFC: The future of Kaldi compliance module #1269

mthrok opened this issue Feb 15, 2021 · 24 comments

Comments

@mthrok
Copy link
Collaborator

mthrok commented Feb 15, 2021

Request For Comment: The future of Kaldi-compatible features

Problems

torchaudio.compliance.kaldi implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache for maintainers.

  1. Inconsistent design
    While the rest of the torchaudio library is standardized to work with floating-point Tensor with value range [-1.0, 1.0], the ported Kaldi implementations are not necessarily following this (example)
  2. Does not support batch dimension well. support batching for kaldi compliant feature extraction functions #675 torchaudio.compliance.kaldi.fbank #1245
  3. compliance is not the right name Compliance should be named compatibility (or similar) #281
  4. While it's called compliance, it does not match with the result of Kaldi's CLI
    I think the natural expectation that users get from the compliance.kaldi is that you can get a result that matches Kaldi, (and possibly in an easy manner)
    The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332, Problems with Kaldi MFCCs #328, Fbank features are different from Kaldi Fbank #400,
  5. It's slow more efficient resample module #908 Torchaudio resampling could be faster and simpler #1057
    The code translated verbatim from the original Kaldi's C++ code typically is slower than the original implementation. (100x is not uncommon) To overcome this the code has to be changed to adopt PyTorch-style operation. (Kaldi does par element access very efficiently, which incurs a lot of overhead for PyTorch framework) This incurs huge maintenance overhead, 1 for implementing our custom code and 2 for catching up with upstream Kaldi.

Possible solutions

Before going into the detail of possible solutions, I note that removing Kaldi compatibility features from torchaudio was also mentioned as a possibility. It was not included in the original plan of torchaudio. I am not familiar on this matter and I am not advocating this but keeping it as a possibility.

The following describes some partial solutions and considerations I have put so far.

Module location.

For problems like 1 (inconsistent design) and 3 (naming), we can add a new interface to torchaudio.functional.

We have some Kaldi-compatible functions in torchaudio.functional module, such as sliding_window_cmn and compute_kaldi_pitch. They are placed under torchaudio.functional because they are confirmed to work on floating-point Tensors and their behaviors are consistent with other feature implementations.

We can do the similar thing so that we can provide the same set of Kaldi features abut that works on float-point Tensor like the other feature extractions too. We can start by adding interface to torchaudio.functional that does value normalization and call the corresponding function in compliance.kaldi module. Then deprecate the compliance.kaldi module and eventually remove it.

Implementation

For problems like 4 (numerical parity) and 5(speed). We have other approaches to make Kaldi features available and compatible with PyTorch.

  1. Keep porting Kaldi in Python
    The current state.
  2. Build and bind libkaldi-feat
    This will resolve the most of the headaches. The usual maintenance cost will be drastically reduced, though we need to figure out a robust way to build Kaldi with MKL that PyTorch is using. Users will get what they would naturally expect. The same result as Kaldi.
  3. Re-building Kaldi's vector/matrix classes with PyTorch's Tensor class.
    This is somewhat like a hybrid approach of 1 and 2. Detail can be found in https://mthrok.github.io/tkaldi/. This is how I added Pitch feature in Add Kaldi Pitch feature #1243

The following table summarizes the pros and cons.

For problems 2 (batch support), one simple approach that can be added is to use at::parallel_for to parallelize the batch computation. This can be applied if the core implementation is in C++.

Bind libkaldi-feat Reimpl libkaldi-matrix Reimplement in Python
Numerical Compatibility ✅ Baseline ✅ Easy 🚫 Extremely Difficult
(None of the existing features meet this criteria)
Execution Speed ✅ Baseline 🚫 Slow (60x~) 🚫 Slow on CPU
Even slower on GPU
Dev Scalability
(Easy to add another feature?)
✅ Easy
(Add new wrapper function)
🍰 Small Effort
(Extend Matrix interface as needed)
🚫 Very Time Consuming
(Understand, translate the code, verify the result)
Upstream Adaptation
(Easy to follow the changes Kaldi makes?)
✅ Easy
(Change the upstream commit, and wrapper function)
✅ Easy
(Pull the upstream code, Change the wrapper function)
🚫 Practically impossible
(I do not know where the existing code comes from)
Maint Cost Initial build setup cost 🤨 Moderate Effort
(Custom Build + MKL setup)
🍰 Small Effort
(Custom Build)
✅ None
Long term maintenance 🍰 Small Effort
(Mostly about wrapper func)
🍰 Small effort
(Mostly about wrapper func)
🚫 High
(All the related codes, 1K LoC)
@mthrok
Copy link
Collaborator Author

mthrok commented Feb 15, 2021

cc Kaldi experts @danpovey @csukuangfj @pzelasko @mravanelli @sw005320

Since I have very limited experience in Kaldi and do not know how Kaldi project thinks of PyTorch/torchaudio, it is nice if you can your feedback. Thanks!

@pzelasko
Copy link

The binding of libkaldi-feat looks like a good solution. We are already following this approach for some components in K2/Snowfall (e.g. kaldialign for WER/alignment computation and kaldilm for LM compilation into FSA).
However my concerns are:

  • will it run on GPU?
  • will it be compatible with autograd?
  • will you be able to efficiently batch it?

If the answer to any of these is no, then we should consider other options.

@danpovey
Copy link

Some comments:

  • The existing Kaldi project is kind of in maintenance mode right now, as I am working on new projects (k2/lhotse/snowfall) that are intended to replace it.
  • I suspect many of the uses of the kaldi compliance module right now are some kind of legacy support, e.g. people who have converted Kaldi models to PyTorch for inference.
  • There is probably a quite limited set of configurations that people are actually using in practice.

I suspect the users of torchaudio's Kaldi compliance fall into very distinct categories: (1) people training new models who don't care about compatibility, (2), and this category is decreasing over time: people porting models from Kaldi who are probably using one of Kaldi's standard feature configurations.

For feature extractors that need to support autograd, I think that can be done separately from the Kaldi compliance module if you end up doing the Kaldi compliance in a way that's not convenient for that.
In principle Kaldi's matrix library can be configured to use various BLAS implementations; IMO it would be more ideal to figure out how to use whatever BLAS is being used by NumPy or PyTorch and try to use that, but the build issues might be complex.

It might be possible to start from the kaldi10feat project (https://github.com/danpovey/kaldi10feat) and modify it to be compatible with current Kaldi. (Note: kaldi10 is a project that I was planning but which does not exist and will not exist). It has a NumPy dependency. It was intended to be a simple feature configuration without all the bells and whistles and options of Kaldi's, but with reasonable defaults. Some of the differences include:

  • I think it gets the frame count in a way that corresponds to --snip-edges=false in Kaldi, which is a simpler formula and tends to be less hassle but is not Kaldi's default (for compatibility reasons we were never able to change the default in Kaldi).
  • It doesn't use pre-emphasis. (--preemph-coeff=0.0)
  • It puts the signal in the range [-1,1] instead of [-32k, 32k]
  • It doesn't use dithering (--dither=0.0)
  • It sets the energy floor to a different value than Kaldi's default, see the code for details.
  • Currently it only goes as far as mel filterbank values, without supporting MFCC.
    Those might be all the differences, if I am not forgetting anything.

@csukuangfj
Copy link
Collaborator

csukuangfj commented Feb 26, 2021

I would propose another alternative: reimplement, instead of binding, libkaldi-feat using torch::Tensor in C++
and provide a Python interface for it.

I've got a preliminary working version that produces the same output as kaldi's compute-fbank-feats using
default parameters with --dither=0.

The following sample wave abc.wav

sox -n -r 16000 -b 16 abc.wav synth 320 sine 300

is used to benchmark the implementation.

I've modified compute-fbank-feats.cc to include only the feature extraction time. The change is shown below

diff --git a/src/featbin/compute-fbank-feats.cc b/src/featbin/compute-fbank-feats.cc
index e52b30baf..63735c985 100644
--- a/src/featbin/compute-fbank-feats.cc
+++ b/src/featbin/compute-fbank-feats.cc
@@ -18,6 +18,8 @@
 // See the Apache 2 License for the specific language governing permissions and
 // limitations under the License.

+#include <chrono>
+
 #include "base/kaldi-common.h"
 #include "feat/feature-fbank.h"
 #include "feat/wave-reader.h"
@@ -145,8 +147,16 @@ int main(int argc, char *argv[]) {
       SubVector<BaseFloat> waveform(wave_data.Data(), this_chan);
       Matrix<BaseFloat> features;
       try {
+        std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
         fbank.ComputeFeatures(waveform, wave_data.SampFreq(),
                               vtln_warp_local, &features);
+        std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
+        std::cout << "Time difference = "
+                  << std::chrono::duration_cast<std::chrono::microseconds>(
+                         end - begin)
+                             .count() /
+                         1000000.
+                  << "[s]" << std::endl;
       } catch (...) {
         KALDI_WARN << "Failed to compute features for utterance " << utt;
         continue;

And the benchmark command is

(base) fangjun:~/open-source/kaldi/src/featbin$ ./compute-fbank-feats --dither=0 scp:1.scp ark:1.ark
./compute-fbank-feats --dither=0 scp:1.scp ark:1.ark
Time difference = 0.313595[s]
LOG (compute-fbank-feats[5.5.880~4-3e446]:main():compute-fbank-feats.cc:195)  Done 1 out of 1 utterances.

1.scp contains

1 abc.wav

The execution time is about 0.3 seconds and does not vary too much across multiple runs.

The reimplementation, which is still in its early draft (https://github.com/csukuangfj/kaldifeat), needs only about 0.1 seconds.
The benchmark command for it is

git clone https://github.com/csukuangfj/kaldifeat
cd kaldifeat
mkdir build
cd build
make -j
./kaldifeat/python/tests/test_kaldifeat.py

The outputs are

(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds: 0.096343
(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds: 0.083852
(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds: 0.141683

@csukuangfj
Copy link
Collaborator

will it be compatible with autograd?

@pzelasko

As there are no learnable parameters in the feature extraction, could you explain what's the use of autograd
in the context of feature extraction?

@pzelasko
Copy link

@csukuangfj Adversarial attacks and defense research

@csukuangfj
Copy link
Collaborator

It's even faster when CUDA is available:

(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds cpu: 0.153896
elapsed seconds cuda:0: 0.005204
(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds cpu: 0.080523
elapsed seconds cuda:0: 0.00535
(py38) fangjun:~/open-source/kaldifeat/build$ ../kaldifeat/python/tests/test_kaldifeat.py
elapsed seconds cpu: 0.085813
elapsed seconds cuda:0: 0.005682

@danpovey
Copy link

@csukuangfj let's wait to see whether that's the direction they want to go in before spending lots of time on this.
Since your repo is public, if they want to use that as a starting point they can easily do that, but right now I think getting our new model working is more urgent.

@csukuangfj
Copy link
Collaborator

@csukuangfj let's wait to see whether that's the direction they want to go in before spending lots of time on this.

Since your repo is public, if they want to use that as a starting point they can easily do that, but right now I think getting our new model working is more urgent.

Will focus on our snowfall training.

@mthrok
Copy link
Collaborator Author

mthrok commented Mar 1, 2021

Hi @pzelasko @danpovey @csukuangfj

Thanks for sharing your thoughts.

So torchaudio team wants to resolve the issue with the current module. That is inconsistent design / bad performance / high maintenance cost. But to do that we need a direction and that's where we would like to hear from the users.

I was pushing for binding libkaldi only because this easily solves performance and maintenance cost. This is very attractive option to me since torchaudio's engineering resource is very limited. Of course, with this approach, we cannot add anew feature like differentiability, which blocks the research for the defense against adversarial attack, differential DSP or E2E training with downstream task.

Meanwhile @cpuhrsch shares the similar view with @csukuangfj. We should be pushing the boundary taking advantage of PyTorch provides, which I do agree, if we have infinite amount of engineering resource.

My understanding of @danpovey's is that complete API compatibility with Kaldi is not necessary and we can reduce the options, which could simplify some implementations.

Does either of K2 or Lhotse provide feature extractions? My understanding of the scene of Speech Recognition is that K2 handles WFST(LM), PyTorch model handles AM, Lhotse handles data handling and training recipes. I am not sure if Lhotse is going to implement the feature extractions on its own, but if Lhotse is going to add an efficient feature extraction, torchaudio can just retire the current module and do nothing. In contrary, if adding efficient implementations in torchaudio is beneficial to the whole ecosystem, we can prioritize that too.

So I think the long term direction is to

  • Provide features that are useful for future audience of the ecosystem (K2, Lhotse, PyTorch, torchaudio etc...)
  • Efficient implementation with GPU support
  • Full backward API compatibility with Kaldi's feature is not necessary, but having numerical parity with basic configuration would be nice.
  • Autograd support is valuable

@csukuangfj
Your kaldifeat looks great. I assume that that is not your primal work. I have integrated it to torchaudio #1326, I will try to see if this approach can support Autograd.

@danpovey
Copy link

danpovey commented Mar 1, 2021

For the time being our plan was to rely on torchaudio for feature extraction, I'm afraid.
That's what we are currently doing.

@h-vetinari
Copy link

Hey all

I've been trying to package torchaudio for conda-forge, and one of the first things I wanted to do (based on how conda-forge does things) is reuse our own kaldi - which we have built such that we can change the BLAS implementation, in particular it should be compatible with MKL.

The problem I have is that I don't know how to patch this, because it's unclear how much the additional implementation is wrapper/extension/reimplementation.

For the moment I think I'll have to disable it, even though that's a pity because it seems we have all the necessary pieces...

@mthrok
Copy link
Collaborator Author

mthrok commented Jul 18, 2022

Hi @h-vetinari

Thanks for the feedback. The Kaldi module in this issue is about torchaudio.compliance module, which is pure Python. The one requires external Kaldi is torchaudio.functional.kaldi_pitch and torchaudio.transforms.KaldiPitch.

These are separate issues and there are more things need to be done for proper packaging in coda-forge, (I think that the same kind of changes should be made to libsox integration) can you perhaps create a new issue?

@h-vetinari
Copy link

h-vetinari commented Jul 18, 2022

Thanks for the quick response!

These are separate issues and there are more things need to be done for proper packaging in conda-forge, (I think that the same kind of changes should be made to libsox integration) can you perhaps create a new issue?

The libsox integration is easier, because I just needed

Author: H. Vetinari <[email protected]>
Date:   Mon Jul 18 23:29:19 2022 +0200

    use our own sox builds

diff --git a/third_party/CMakeLists.txt b/third_party/CMakeLists.txt
index ef984d5..8ae419c 100644
--- a/third_party/CMakeLists.txt
+++ b/third_party/CMakeLists.txt
@@ -5,12 +5,8 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fvisibility=hidden")
 ################################################################################
 # sox
 ################################################################################
-add_library(libsox INTERFACE)
 if (BUILD_SOX)
-  add_subdirectory(sox)
-  target_include_directories(libsox INTERFACE ${SOX_INCLUDE_DIR})
-  target_link_libraries(libsox INTERFACE ${SOX_LIBRARIES})
-  list(APPEND TORCHAUDIO_THIRD_PARTIES libsox)
+  find_package(sox REQUIRED)
 endif()

 ################################################################################

whereas for kaldi, it looks to me like I cannot replace https://github.com/pytorch/audio/tree/main/third_party/kaldi/src/matrix with vanilla kaldi, due to the extra pytorch integration happening there.

I'm happy to open an issue, but first I'd need to understand better what needs to be done. From what I can tell from the OP, using libkaldi-feat (which we have available in conda-forge) would work really well for us, hence I started commenting here.

PS. It's also not super clean for torchaudio to insert itself into the (C++) namespace of another project, IMHO, so that'd be another thing in favour of refactoring that.

@mthrok
Copy link
Collaborator Author

mthrok commented Jul 18, 2022

The libsox integration is easier

Actually, this approach incurs subtle, hard-to-debug issue. We patch libsox because the most publicly available version (14.4.2 released in 2015) has some bug when it comes to in-memory decoding. This has been fixed in upstream, but the code has been diverged, and there is no official release that includes this code. So, to completely rely on external libsox, we need to replace file-like object support with something else. We have FFmpeg binding now, so we can do that.

whereas for kaldi, it looks to me like I cannot replace https://github.com/pytorch/audio/tree/main/third_party/kaldi/src/matrix with vanilla kaldi, due to the extra pytorch integration happening there.

I'm happy to open an issue, but first I'd need to understand better what needs to be done. From what I can tell from the OP, using libkaldi-feat (which we have available in conda-forge) would work really well for us, hence I started commenting here.

PS. It's also not super clean for torchaudio to insert itself into the (C++) namespace of another project, IMHO, so that'd be another thing in favour of refactoring that.

To enable the external use of Kaldi, some par of the code has to be re-implemented. Originally, I was simply binding the Kaldi code, copying the values of Tensor into Kaldi's Matrix format. This was clean, efficient, low-maintenance approach, but unfortunately I could not get an approval with this approach, so I had to do the hack. Now the time is different, and I think we can go back to the original approach, which allows to use external Kaldi, which should reduce the maintenance cost on your end as-well.

Let's bring the the conversation to a new issue. There are things not clear to me on condo-forge as well. I will ask my question there.

@h-vetinari
Copy link

h-vetinari commented Jul 19, 2022

Thanks again for the input.

Let's bring the the conversation to a new issue.

I still don't understand how the situation and overall goal would be different from "move to libkaldi-feat", as discussed in the OP here. Could you open that issue perhaps, as you seem to have a clearer picture in mind?

Actually, this approach incurs subtle, hard-to-debug issue. We patch libsox because the most publicly available version (14.4.2 released in 2015) has some bug when it comes to in-memory decoding.

Could you let me know which commits upstream are necessary to fix the issue? Conda-forge controls its entire ecosystem, so we could add that patch to our sox-builds and get the best of both worlds.

@h-vetinari
Copy link

Actually, this approach incurs subtle, hard-to-debug issue. We patch libsox because the most publicly available version (14.4.2 released in 2015) has some bug when it comes to in-memory decoding.

Could you let me know which commits upstream are necessary to fix the issue? Conda-forge controls its entire ecosystem, so we could add that patch to our sox-builds and get the best of both worlds.

Ping @mthrok - I'd be happy to fix the bugs you mention in our packaging of sox, not least because I want to (re-)use those builds for torchaudio.

@danpovey
Copy link

I don't like to be a downer on Kaldi integration, but in my opinion this is a "tragedy of the commons" situation, where for your individual needs it would likely be convenient to use libkaldi-feat as a library, but adding one extra external code dependency to the PyTorch codebase, which (a) is already very large, and (b) many other things depend on it, is in my opinion a bad thing. The commons being PyTorch itself, and the overuse being cramming it with external dependencies.

Are the pitch features actually needed for something? In recent years, people have been relying more on having neural nets implicitly learn the pitch, which avoids the need for a pitch extractor (and pitch extractors add extra latency!). I just wonder what is the underlying thing that you needed from Kaldi

@h-vetinari
Copy link

where for your individual needs it would likely be convenient to use libkaldi-feat as a library, but adding one extra external code dependency to the PyTorch codebase, which (a) is already very large, and (b) many other things depend on it, is in my opinion a bad thing.

I agree that there are many dependencies, and that this has a cost. But the current status is that without having a kaldi installation (and the corresponding headers) torchaudio isn't installable, so it is a dependency already.

Removing that dependence is a legitimate question, but very different from what I was discussing (which was: if you're going to use kaldi, can you use it in a more standard way)

@h-vetinari
Copy link

Actually, this approach incurs subtle, hard-to-debug issue. We patch libsox because the most publicly available version (14.4.2 released in 2015) has some bug when it comes to in-memory decoding.

Could you let me know which commits upstream are necessary to fix the issue? Conda-forge controls its entire ecosystem, so we could add that patch to our sox-builds and get the best of both worlds.

Found #1297 and how it solved the issue differently from the upstream patch.

AFAICT, the TODO from that PR is still open:

[ ] Upstream the patch to sox project

Trying to backport this to conda-forge in conda-forge/sox-feedstock#27.

@h-vetinari
Copy link

@mthrok: So, to completely rely on external libsox, we need to replace file-like object support with something else. We have FFmpeg binding now, so we can do that.

Coming back to this topic - in light of the status of kaldi and there apparently existing an alternative through FFmpeg, do you think it would be reasonable to package a version of torchaudio that just completely disables kaldi-support? Or would that be too deep of a cut?

@h-vetinari
Copy link

And could ffmpeg replace the usage of sox as well...?

@danpovey
Copy link

I think removing the Kaldi support is probably a good idea. I don't know enough about what features from sox are used to comment on the sox/ffmpeg issue.

@mthrok
Copy link
Collaborator Author

mthrok commented May 15, 2023

@mthrok: So, to completely rely on external libsox, we need to replace file-like object support with something else. We have FFmpeg binding now, so we can do that.

Coming back to this topic - in light of the status of kaldi and there apparently existing an alternative through FFmpeg, do you think it would be reasonable to package a version of torchaudio that just completely disables kaldi-support? Or would that be too deep of a cut?

@h-vetinari We can disable the feature built on-top of Kaldi code, by setting this environment variable.

And could ffmpeg replace the usage of sox as well...?

We are making progress here #2950.
On main branch, we switched to FFmpeg for default (when available) I/O mechanism.

Two remaining items are

I think removing the Kaldi support is probably a good idea. I don't know enough about what features from sox are used to comment on the sox/ffmpeg issue.

The original request came from @sw005320.

mthrok added a commit to mthrok/audio that referenced this issue May 24, 2023
This commit removes compute_kaldi_pitch function and the underlying
Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by
integrating the original Kaldi implementation, instead of
reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base
vector/matrix implementation of Kaldi with PyTorch Tensor so that
there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see
a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion pytorch#1269
mthrok added a commit to mthrok/audio that referenced this issue May 24, 2023
This commit removes compute_kaldi_pitch function and the underlying
Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by
integrating the original Kaldi implementation, instead of
reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base
vector/matrix implementation of Kaldi with PyTorch Tensor so that
there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see
a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion pytorch#1269
mthrok added a commit to mthrok/audio that referenced this issue May 24, 2023
This commit removes compute_kaldi_pitch function and the underlying
Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by
integrating the original Kaldi implementation, instead of
reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base
vector/matrix implementation of Kaldi with PyTorch Tensor so that
there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see
a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion pytorch#1269
mthrok added a commit to mthrok/audio that referenced this issue May 31, 2023
This commit removes compute_kaldi_pitch function and the underlying
Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by
integrating the original Kaldi implementation, instead of
reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base
vector/matrix implementation of Kaldi with PyTorch Tensor so that
there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see
a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion pytorch#1269
mthrok added a commit to mthrok/audio that referenced this issue Jun 2, 2023
This commit removes compute_kaldi_pitch function and the underlying
Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by
integrating the original Kaldi implementation, instead of
reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base
vector/matrix implementation of Kaldi with PyTorch Tensor so that
there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see
a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion pytorch#1269
facebook-github-bot pushed a commit that referenced this issue Jun 2, 2023
Summary:
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.

Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.

The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.

Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.

See some of the discussion #1269

Pull Request resolved: #3368

Differential Revision: D46406176

Pulled By: mthrok

fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants