Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate kaldi fbank #672

Merged
merged 5 commits into from
Jun 3, 2020
Merged

Migrate kaldi fbank #672

merged 5 commits into from
Jun 3, 2020

Conversation

bhargavkathivarapu
Copy link
Contributor

Hi ,

This PR migrates the kaldi fbank test (#597 ) from test/test_compliance_kaldi.py to test/kaldi_compatibility_impl.py

Signed-off-by: Bhargav Kathivarapu <[email protected]>
Signed-off-by: Bhargav Kathivarapu <[email protected]>
@mthrok
Copy link
Collaborator

mthrok commented Jun 1, 2020

Hi @bhargavkathivarapu

Thanks for working on this!
I think CI issue is something intermittent. (probably the webhook did not get delivered to CircleCI.) So it should work, if you push a commit again.

I tried your change and it almost worked. Here is the fix I suggest. (Note that you need to re-add parameterized to environment.yml when running tests on CI)

  1. [nit] I moved import json. PEP8 recommends group imports by standard libraries, one line break, external libraries, one line break, own library.
  2. It turned out that when passing iterables as one parameter, it has to be wrapped with tuple or parameterized.param object. (also changed the helper function name to accommodate this change)
  3. There was an issue with the order of decorators. Flipping them fixed it.
diff --git a/test/kaldi_compatibility_impl.py b/test/kaldi_compatibility_impl.py
index aed5a35..880004d 100644
--- a/test/kaldi_compatibility_impl.py
+++ b/test/kaldi_compatibility_impl.py
@@ -1,4 +1,5 @@
 """Test suites for checking numerical compatibility against Kaldi"""
+import json
 import shutil
 import unittest
 import subprocess
@@ -9,8 +10,7 @@ import torchaudio.functional as F
 import torchaudio.compliance.kaldi

 from . import common_utils
-from parameterized import parameterized
-import json
+from parameterized import parameterized, param


 def _not_available(cmd):
@@ -49,9 +49,9 @@ def _run_kaldi(command, input_type, input_value):
     return torch.from_numpy(result.copy())  # copy supresses some torch warning


-def _load_jsonl(path):
+def _load_params(path):
     with open(path, 'r') as file:
-        return [json.loads(line) for line in file]
+        return [param(json.loads(line)) for line in file]


 class Kaldi(common_utils.TestBaseMixin):
@@ -75,8 +75,8 @@ class Kaldi(common_utils.TestBaseMixin):
         kaldi_result = _run_kaldi(command, 'ark', tensor)
         self.assert_equal(result, expected=kaldi_result)

+    @parameterized.expand(_load_params(common_utils.get_asset_path('kaldi_test_fbank_args.json')))
     @unittest.skipIf(_not_available('compute-fbank-feats'), '`compute-fbank-feats` not available')
-    @parameterized.expand(_load_jsonl(common_utils.get_asset_path('kaldi_test_fbank_args.json')))
     def test_fbank(self, kwargs):
         """fbank should be numerically compatible with compute-fbank-feats"""
         wave_file = common_utils.get_asset_path('kaldi_file.wav')

Signed-off-by: Bhargav Kathivarapu <[email protected]>
@bhargavkathivarapu
Copy link
Contributor Author

@mthrok , Made the changes
With existing threshold for fbank : rtol=1e-4 and atol=1e-8
Out of 97 arg configurations , 9 are not satisfying the threshold

Old complaince kaldi test is using rtol=1e-1 and atol=1e-3 for fbank , should we change the threshold to this ??

@mthrok
Copy link
Collaborator

mthrok commented Jun 2, 2020

Hi @bhargavkathivarapu

Looking at the log, most of them are still close enough, but one of them looks way off.

____________________________________________________________________________________________________ TestKaldi_CPU_Float32.test_fbank_64 _____________________________________________________________________________________________________

a = (<test.common_utils.TestKaldi_CPU_Float32 testMethod=test_fbank_64>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/parameterized/parameterized.py:530:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/kaldi_compatibility_impl.py:87: in test_fbank
    self.assert_equal(result, expected=kaldi_result, rtol=1e-4, atol=1e-8)
test/kaldi_compatibility_impl.py:60: in assert_equal
    self.assertEqual(output, expected, rtol=rtol, atol=atol)
../pytorch/torch/testing/_internal/common_utils.py:1083: in assertEqual
    self.assertTrue(result, msg=message)
E   AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.0001 and atol=1e-08, found 1 element(s) (out of 5) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.8112268447875977 (10.725326538085938 vs. 9.91409969329834), which occurred at index (0, 3).
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
compute-fbank-feats --blackman-coeff=3.0442 --energy-floor=4.0677 --frame-length=1.0625 --frame-shift=1.125 --high-freq=5086 --htk-compat=true --low-freq=1013 --num-mel-bins=4 --preemphasis-coefficient=0.99 --raw-energy=false --remove-dc-offset=false --round-to-power-of-two=true --snip-edges=true --subtract-mean=false --use-energy=true --use-log-fbank=true --use-power=false --vtln-high=4997 --vtln-low=4836 --vtln-warp=1.9525 --window-type=hamming --dither=0.0 scp:- ark:-
LOG (compute-fbank-feats[5.5.689~1-2c7e7]:main():compute-fbank-feats.cc:185)  Done 1 out of 1 utterances.

I think some values are out of expected range, which makes me question the validity of the original test.

$ compute-fbank-feats --help

Create Mel-filter bank (FBANK) feature files.
Usage:  compute-fbank-feats [options...] <wav-rspecifier> <feats-wspecifier>

Options:
  --allow-downsample          : If true, allow the input waveform to have a higher frequency than the specified --sample-frequency (and we'll downsample). (bool, default = false)
  --allow-upsample            : If true, allow the input waveform to have a lower frequency than the specified --sample-frequency (and we'll upsample). (bool, default = false)
  --blackman-coeff            : Constant coefficient for generalized Blackman window. (float, default = 0.42)
  --channel                   : Channel to extract (-1 -> expect mono, 0 -> left, 1 -> right) (int, default = -1)
  --debug-mel                 : Print out debugging information for mel bin computation (bool, default = false)
  --dither                    : Dithering constant (0.0 means no dither). If you turn this off, you should set the --energy-floor option, e.g. to 1.0 or 0.1 (float, default = 1)
  --energy-floor              : Floor on energy (absolute, not relative) in FBANK computation. Only makes a difference if --use-energy=true; only necessary if --dither=0.0.  Suggested values: 0.1 or 1.0 (float, default = 0)
  --frame-length              : Frame length in milliseconds (float, default = 25)
  --frame-shift               : Frame shift in milliseconds (float, default = 10)
  --high-freq                 : High cutoff frequency for mel bins (if <= 0, offset from Nyquist) (float, default = 0)
  --htk-compat                : If true, put energy last.  Warning: not sufficient to get HTK compatible features (need to change other parameters). (bool, default = false)
  --low-freq                  : Low cutoff frequency for mel bins (float, default = 20)
  --max-feature-vectors       : Memory optimization. If larger than 0, periodically remove feature vectors so that only this number of the latest feature vectors is retained. (int, default = -1)
  --min-duration              : Minimum duration of segments to process (in seconds). (float, default = 0)
  --num-mel-bins              : Number of triangular mel-frequency bins (int, default = 23)
  --output-format             : Format of the output files [kaldi, htk] (string, default = "kaldi")
  --preemphasis-coefficient   : Coefficient for use in signal preemphasis (float, default = 0.97)
  --raw-energy                : If true, compute energy before preemphasis and windowing (bool, default = true)
  --remove-dc-offset          : Subtract mean from waveform on each frame (bool, default = true)
  --round-to-power-of-two     : If true, round window size to power of two by zero-padding input to FFT. (bool, default = true)
  --sample-frequency          : Waveform data sample frequency (must match the waveform file, if specified there) (float, default = 16000)
  --snip-edges                : If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length.  If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. (bool, default = true)
  --subtract-mean             : Subtract mean of each feature file [CMS]; not recommended to do it this way.  (bool, default = false)
  --use-energy                : Add an extra dimension with energy to the FBANK output. (bool, default = false)
  --use-log-fbank             : If true, produce log-filterbank, else produce linear. (bool, default = true)
  --use-power                 : If true, use power, else use magnitude. (bool, default = true)
  --utt2spk                   : Utterance to speaker-id map (if doing VTLN and you have warps per speaker) (string, default = "")
  --vtln-high                 : High inflection point in piecewise linear VTLN warping function (if negative, offset from high-mel-freq (float, default = -500)
  --vtln-low                  : Low inflection point in piecewise linear VTLN warping function (float, default = 100)
  --vtln-map                  : Map from utterance or speaker-id to vtln warp factor (rspecifier) (string, default = "")
  --vtln-warp                 : Vtln warp factor (only applicable if vtln-map not specified) (float, default = 1)
  --window-type               : Type of window ("hamming"|"hanning"|"povey"|"rectangular"|"sine"|"blackmann") (string, default = "povey")
  --write-utt2dur             : Wspecifier to write duration of each utterance in seconds, e.g. 'ark,t:utt2dur'. (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Looking at the default values, some values are very far from default.
For example, --frame-length, --frame-shift and --energy-floor seems very off to me.
Do you have an insight of this?

@bhargavkathivarapu
Copy link
Contributor Author

bhargavkathivarapu commented Jun 2, 2020

@mthrok From the fbank argument generation script present at test/compliance/generate_fbank_data.py ( assuming these arguments are generated from this script )

Below configuration is used to generate for those 3 args :

wave_len  = 20
'energy_floor': '%.4f' % (random.random() * 5),
 'frame_length': '%.4f' % (float(random.randint(3, wave_len - 1)) / 16000 * 1000),
 'frame_shift': '%.4f' % (float(random.randint(1, wave_len - 1)) / 16000 * 1000)

Not sure about the exact range of values that these arguments must take

@mthrok
Copy link
Collaborator

mthrok commented Jun 3, 2020

So I got the advice from some experienced Kaldi user,

  • --frame-length and --frame-shift should be integer value around 10 - 50 [ms] if perturbing
  • --preemphasis-coefficient should be less than 1.0, around 0.90 - 0.99
  • --low-freq should be around 20 - 50 (and less than --high-freq)
  • --high-freq typical number -200 for 8000 Hz audio, ([3800 == (8000 / 2 - 200)] Hz). should pick numbers found in Kaldi examples.

So I think that parameters failing are not valid use cases. We should remove them.
For we can leave the passing cases as is and revise the validity of them pater as a follow up.

Signed-off-by: Bhargav Kathivarapu <[email protected]>
Signed-off-by: Bhargav Kathivarapu <[email protected]>
@bhargavkathivarapu
Copy link
Contributor Author

@mthrok , Removed those 9 failing cases from JSON . Now all unit tests passed

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@mthrok mthrok merged commit 8a03087 into pytorch:master Jun 3, 2020
@mthrok
Copy link
Collaborator

mthrok commented Jun 3, 2020

@bhargavkathivarapu I forgot to add but can you do the honor to remove the migrated tests (test_compliance_kaldi.py::Test_Kaldi::test_fbank) and data (ark) files??

@bhargavkathivarapu
Copy link
Contributor Author

yeah , will remove all compliance files at once , after all tests are migrated

mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
* fix: Namespace issue of Reduction in CPP MNIST

- Changed the at::Reduction::Sum into Reduction::Sum

Solved issue pytorch#672

Signed-off-by: Arkadip <[email protected]>

* Update cpp/mnist/mnist.cpp

Co-Authored-By: Will Feng <[email protected]>

Co-authored-by: Will Feng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants