-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: The future of Kaldi compliance module #1269
Comments
cc Kaldi experts @danpovey @csukuangfj @pzelasko @mravanelli @sw005320 Since I have very limited experience in Kaldi and do not know how Kaldi project thinks of PyTorch/torchaudio, it is nice if you can your feedback. Thanks! |
The binding of libkaldi-feat looks like a good solution. We are already following this approach for some components in K2/Snowfall (e.g. kaldialign for WER/alignment computation and kaldilm for LM compilation into FSA).
If the answer to any of these is no, then we should consider other options. |
Some comments:
I suspect the users of torchaudio's Kaldi compliance fall into very distinct categories: (1) people training new models who don't care about compatibility, (2), and this category is decreasing over time: people porting models from Kaldi who are probably using one of Kaldi's standard feature configurations. For feature extractors that need to support autograd, I think that can be done separately from the Kaldi compliance module if you end up doing the Kaldi compliance in a way that's not convenient for that. It might be possible to start from the kaldi10feat project (https://github.com/danpovey/kaldi10feat) and modify it to be compatible with current Kaldi. (Note: kaldi10 is a project that I was planning but which does not exist and will not exist). It has a NumPy dependency. It was intended to be a simple feature configuration without all the bells and whistles and options of Kaldi's, but with reasonable defaults. Some of the differences include:
|
I would propose another alternative: reimplement, instead of binding, libkaldi-feat using I've got a preliminary working version that produces the same output as kaldi's The following sample wave
is used to benchmark the implementation. I've modified
And the benchmark command is (base) fangjun:~/open-source/kaldi/src/featbin$ ./compute-fbank-feats --dither=0 scp:1.scp ark:1.ark
./compute-fbank-feats --dither=0 scp:1.scp ark:1.ark
Time difference = 0.313595[s]
LOG (compute-fbank-feats[5.5.880~4-3e446]:main():compute-fbank-feats.cc:195) Done 1 out of 1 utterances.
The execution time is about 0.3 seconds and does not vary too much across multiple runs. The reimplementation, which is still in its early draft (https://github.com/csukuangfj/kaldifeat), needs only about 0.1 seconds. git clone https://github.com/csukuangfj/kaldifeat
cd kaldifeat
mkdir build
cd build
make -j
./kaldifeat/python/tests/test_kaldifeat.py The outputs are
|
As there are no learnable parameters in the feature extraction, could you explain what's the use of autograd |
@csukuangfj Adversarial attacks and defense research |
It's even faster when CUDA is available:
|
@csukuangfj let's wait to see whether that's the direction they want to go in before spending lots of time on this. |
Will focus on our snowfall training. |
Hi @pzelasko @danpovey @csukuangfj Thanks for sharing your thoughts. So torchaudio team wants to resolve the issue with the current module. That is inconsistent design / bad performance / high maintenance cost. But to do that we need a direction and that's where we would like to hear from the users. I was pushing for binding libkaldi only because this easily solves performance and maintenance cost. This is very attractive option to me since torchaudio's engineering resource is very limited. Of course, with this approach, we cannot add anew feature like differentiability, which blocks the research for the defense against adversarial attack, differential DSP or E2E training with downstream task. Meanwhile @cpuhrsch shares the similar view with @csukuangfj. We should be pushing the boundary taking advantage of PyTorch provides, which I do agree, if we have infinite amount of engineering resource. My understanding of @danpovey's is that complete API compatibility with Kaldi is not necessary and we can reduce the options, which could simplify some implementations. Does either of K2 or Lhotse provide feature extractions? My understanding of the scene of Speech Recognition is that K2 handles WFST(LM), PyTorch model handles AM, Lhotse handles data handling and training recipes. I am not sure if Lhotse is going to implement the feature extractions on its own, but if Lhotse is going to add an efficient feature extraction, torchaudio can just retire the current module and do nothing. In contrary, if adding efficient implementations in torchaudio is beneficial to the whole ecosystem, we can prioritize that too. So I think the long term direction is to
@csukuangfj |
For the time being our plan was to rely on torchaudio for feature extraction, I'm afraid. |
Hey all I've been trying to package torchaudio for conda-forge, and one of the first things I wanted to do (based on how conda-forge does things) is reuse our own kaldi - which we have built such that we can change the BLAS implementation, in particular it should be compatible with MKL. The problem I have is that I don't know how to patch this, because it's unclear how much the additional implementation is wrapper/extension/reimplementation. For the moment I think I'll have to disable it, even though that's a pity because it seems we have all the necessary pieces... |
Hi @h-vetinari Thanks for the feedback. The Kaldi module in this issue is about These are separate issues and there are more things need to be done for proper packaging in coda-forge, (I think that the same kind of changes should be made to libsox integration) can you perhaps create a new issue? |
Thanks for the quick response!
The libsox integration is easier, because I just needed Author: H. Vetinari <[email protected]>
Date: Mon Jul 18 23:29:19 2022 +0200
use our own sox builds
diff --git a/third_party/CMakeLists.txt b/third_party/CMakeLists.txt
index ef984d5..8ae419c 100644
--- a/third_party/CMakeLists.txt
+++ b/third_party/CMakeLists.txt
@@ -5,12 +5,8 @@ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fvisibility=hidden")
################################################################################
# sox
################################################################################
-add_library(libsox INTERFACE)
if (BUILD_SOX)
- add_subdirectory(sox)
- target_include_directories(libsox INTERFACE ${SOX_INCLUDE_DIR})
- target_link_libraries(libsox INTERFACE ${SOX_LIBRARIES})
- list(APPEND TORCHAUDIO_THIRD_PARTIES libsox)
+ find_package(sox REQUIRED)
endif()
################################################################################ whereas for kaldi, it looks to me like I cannot replace https://github.com/pytorch/audio/tree/main/third_party/kaldi/src/matrix with vanilla kaldi, due to the extra pytorch integration happening there. I'm happy to open an issue, but first I'd need to understand better what needs to be done. From what I can tell from the OP, using PS. It's also not super clean for torchaudio to insert itself into the (C++) namespace of another project, IMHO, so that'd be another thing in favour of refactoring that. |
Actually, this approach incurs subtle, hard-to-debug issue. We patch libsox because the most publicly available version (14.4.2 released in 2015) has some bug when it comes to in-memory decoding. This has been fixed in upstream, but the code has been diverged, and there is no official release that includes this code. So, to completely rely on external libsox, we need to replace file-like object support with something else. We have FFmpeg binding now, so we can do that.
To enable the external use of Kaldi, some par of the code has to be re-implemented. Originally, I was simply binding the Kaldi code, copying the values of Tensor into Kaldi's Matrix format. This was clean, efficient, low-maintenance approach, but unfortunately I could not get an approval with this approach, so I had to do the hack. Now the time is different, and I think we can go back to the original approach, which allows to use external Kaldi, which should reduce the maintenance cost on your end as-well. Let's bring the the conversation to a new issue. There are things not clear to me on condo-forge as well. I will ask my question there. |
Thanks again for the input.
I still don't understand how the situation and overall goal would be different from "move to libkaldi-feat", as discussed in the OP here. Could you open that issue perhaps, as you seem to have a clearer picture in mind?
Could you let me know which commits upstream are necessary to fix the issue? Conda-forge controls its entire ecosystem, so we could add that patch to our sox-builds and get the best of both worlds. |
Ping @mthrok - I'd be happy to fix the bugs you mention in our packaging of sox, not least because I want to (re-)use those builds for torchaudio. |
I don't like to be a downer on Kaldi integration, but in my opinion this is a "tragedy of the commons" situation, where for your individual needs it would likely be convenient to use libkaldi-feat as a library, but adding one extra external code dependency to the PyTorch codebase, which (a) is already very large, and (b) many other things depend on it, is in my opinion a bad thing. The commons being PyTorch itself, and the overuse being cramming it with external dependencies. Are the pitch features actually needed for something? In recent years, people have been relying more on having neural nets implicitly learn the pitch, which avoids the need for a pitch extractor (and pitch extractors add extra latency!). I just wonder what is the underlying thing that you needed from Kaldi |
I agree that there are many dependencies, and that this has a cost. But the current status is that without having a kaldi installation (and the corresponding headers) torchaudio isn't installable, so it is a dependency already. Removing that dependence is a legitimate question, but very different from what I was discussing (which was: if you're going to use kaldi, can you use it in a more standard way) |
Found #1297 and how it solved the issue differently from the upstream patch. AFAICT, the TODO from that PR is still open:
Trying to backport this to conda-forge in conda-forge/sox-feedstock#27. |
Coming back to this topic - in light of the status of kaldi and there apparently existing an alternative through FFmpeg, do you think it would be reasonable to package a version of torchaudio that just completely disables kaldi-support? Or would that be too deep of a cut? |
And could ffmpeg replace the usage of sox as well...? |
I think removing the Kaldi support is probably a good idea. I don't know enough about what features from sox are used to comment on the sox/ffmpeg issue. |
@h-vetinari We can disable the feature built on-top of Kaldi code, by setting this environment variable.
We are making progress here #2950. Two remaining items are
The original request came from @sw005320. |
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion pytorch#1269
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion pytorch#1269
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion pytorch#1269
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion pytorch#1269
This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion pytorch#1269
Summary: This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio. Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch. The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio. Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them. See some of the discussion #1269 Pull Request resolved: #3368 Differential Revision: D46406176 Pulled By: mthrok fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
Request For Comment: The future of Kaldi-compatible features
Problems
torchaudio.compliance.kaldi
implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache for maintainers.While the rest of the
torchaudio
library is standardized to work with floating-point Tensor with value range[-1.0, 1.0]
, the ported Kaldi implementations are not necessarily following this (example)compliance
is not the right name Compliance should be named compatibility (or similar) #281compliance
, it does not match with the result of Kaldi's CLII think the natural expectation that users get from the
compliance.kaldi
is that you can get a result that matches Kaldi, (and possibly in an easy manner)The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332, Problems with Kaldi MFCCs #328, Fbank features are different from Kaldi Fbank #400,
The code translated verbatim from the original Kaldi's C++ code typically is slower than the original implementation. (100x is not uncommon) To overcome this the code has to be changed to adopt PyTorch-style operation. (Kaldi does par element access very efficiently, which incurs a lot of overhead for PyTorch framework) This incurs huge maintenance overhead, 1 for implementing our custom code and 2 for catching up with upstream Kaldi.
Possible solutions
Before going into the detail of possible solutions, I note that removing Kaldi compatibility features from
torchaudio
was also mentioned as a possibility. It was not included in the original plan oftorchaudio
. I am not familiar on this matter and I am not advocating this but keeping it as a possibility.The following describes some partial solutions and considerations I have put so far.
Module location.
For problems like 1 (inconsistent design) and 3 (naming), we can add a new interface to
torchaudio.functional
.We have some Kaldi-compatible functions in
torchaudio.functional
module, such as sliding_window_cmn and compute_kaldi_pitch. They are placed undertorchaudio.functional
because they are confirmed to work on floating-point Tensors and their behaviors are consistent with other feature implementations.We can do the similar thing so that we can provide the same set of Kaldi features abut that works on float-point Tensor like the other feature extractions too. We can start by adding interface to
torchaudio.functional
that does value normalization and call the corresponding function incompliance.kaldi
module. Then deprecate thecompliance.kaldi
module and eventually remove it.Implementation
For problems like 4 (numerical parity) and 5(speed). We have other approaches to make Kaldi features available and compatible with PyTorch.
The current state.
libkaldi-feat
This will resolve the most of the headaches. The usual maintenance cost will be drastically reduced, though we need to figure out a robust way to build Kaldi with MKL that PyTorch is using. Users will get what they would naturally expect. The same result as Kaldi.
This is somewhat like a hybrid approach of 1 and 2. Detail can be found in https://mthrok.github.io/tkaldi/. This is how I added Pitch feature in Add Kaldi Pitch feature #1243
The following table summarizes the pros and cons.
For problems 2 (batch support), one simple approach that can be added is to use
at::parallel_for
to parallelize the batch computation. This can be applied if the core implementation is in C++.(None of the existing features meet this criteria)
Even slower on GPU
(Easy to add another feature?)
(Add new wrapper function)
(Extend Matrix interface as needed)
(Understand, translate the code, verify the result)
(Easy to follow the changes Kaldi makes?)
(Change the upstream commit, and wrapper function)
(Pull the upstream code, Change the wrapper function)
(I do not know where the existing code comes from)
(Custom Build + MKL setup)
(Custom Build)
(Mostly about wrapper func)
(Mostly about wrapper func)
(All the related codes, 1K LoC)
The text was updated successfully, but these errors were encountered: