Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Feature Request: Add Kaldi Pitch Feature #686

Closed
mthrok opened this issue Jun 4, 2020 · 7 comments
Closed

🚀 Feature Request: Add Kaldi Pitch Feature #686

mthrok opened this issue Jun 4, 2020 · 7 comments

Comments

@mthrok
Copy link
Collaborator

mthrok commented Jun 4, 2020

🚀 Feature

Add feature that is equivalent to Kaldi's compute-kaldi-pitch-feats.

Motivation

From #679 (comment)

We found that the pitch feature always improved the performance for several tonal languages (e.g., Chinese), and did not degrade the performance for the other languages.
So, espnet1 decided to use log Mel filterbank + pitch features as default.
However, the pitch feature extraction is rather complicated, and we had some difficulties in making this pitch feature extraction fully written by torch functions.
So, espnet2 decided to only use log Mel filterbank features, instead.
We still observe a slight degradation of the ASR performance, but that can be mitigated by some tuning.
We're now moving to espnet2 so we don't need it in the long term, but probably it is quite beneficial for the short term or people keep to use espnet1.

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 4, 2020

@sw005320 For the reference, could you give me the pointer to ESPNet1's implementation of pitch?

@mthrok mthrok added the Kaldi label Jun 4, 2020
@sw005320
Copy link

sw005320 commented Jun 4, 2020

We simply call Kaldi pitch extraction. We don't have our own pitch extraction.

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 4, 2020

I see, thanks!

@mthrok
Copy link
Collaborator Author

mthrok commented Jun 15, 2020

Some thoughts on spec:

Interface

def compute_pitch_feats(
        waveform: Tensor,
        delta_pitch: float = 0.005,
        frame_length: float = 25.,
        frame_shift: float = 10.,
        frames_per_chunk: int = 0,
        lowpass_cutoff: float = 1000.,
        lowpass_filter_width: int = 1,
        max_f0: float = 400.,
        max_frames_latency: int = 0,
        min_f0: float = 50.,
        nccf_ballast: float = 7000.,
        nccf_ballast_online: bool = False,
        penalty_factor: float = 0.1,
        recompute_frame: int = 500,
        resample_frequency: float = 4000,
        sample_frequency: float = 16000,
        simulate_first_pass_online: bool = False,
        snip_edges: bool = True,
        soft_min_f0: float = 10.,
        upsample_filter_width: int = 5,
) -> Tensor:
    ...

Implementation

Test

  1. A new test suite in Kaldi compatibility test
  2. A set of parameters to be tested. Similar to Revise parameters for Kaldi mfcc compatibility test #689

@mthrok
Copy link
Collaborator Author

mthrok commented Aug 17, 2020

@sw005320 I am looking at Kaldi implementation and wondering if we can limit the number of parameters to expose.
For example, I do not think we need parameters for online feature extractions. Do you have a set of parameters you think will be changing?

https://kaldi-asr.org/doc/pitch-functions_8h_source.html#l00042

@sw005320
Copy link

Sorry for my late response...
We usually only change the sampling frequency (yes, it is necessary), and keep the other parameters default, but it's robustly working on various ASR tasks.

Also, I did not try the online pitch feature and I could not mention this part...

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021
@mthrok mthrok removed the help wanted label Mar 3, 2021
@mthrok
Copy link
Collaborator Author

mthrok commented Mar 3, 2021

Kaldi pitch feature was added in #1243, and will be released as a beta feature in upcoming 0.8.0. We welcome feedback on the feature.

@mthrok mthrok closed this as completed Mar 3, 2021
mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023
This script uses some bash-specific features such as
- function keyword
- == operator
- FUNCNAME
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants