Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed some bugs in mel filterbanks. #36

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yorange1
Copy link

@yorange1 yorange1 commented Nov 5, 2021

I wrote some code to compare the mel filterbanks in librosa, python speech feature and speechpy, and found two problems.

    1. The initialization of the band edge of the Mel filterbanks may be wrong.
    1. The calculation to convert frequency to fft bin number is wrong.
import matplotlib.pyplot as plt
import numpy as np
import librosa
import python_speech_features as psf
import speechpy

n_fft = 256        # The number of FFT components
n_filter = 20      # The number of filters in the filterbank
samplerate = 16000 # The samplerate of the signal
low_freq = 0       # The lowest band edge of the filters
high_freq = 8000   # The highest band edge of the filters

librosa_fbanks = librosa.filters.mel(
    sr=samplerate, n_fft=n_fft, n_mels=n_filter, fmin=low_freq, fmax=high_freq, norm=None)
print("Librosa mel fbanks shape:{}".format(librosa_fbanks.shape))

psf_fbanks = psf.base.get_filterbanks(
    nfilt=n_filter, nfft=n_fft, samplerate=samplerate, lowfreq=low_freq, highfreq=high_freq)
print("PSF mel fbanks shape:{}".format(psf_fbanks.shape))

coefficients = int(n_fft/2 + 1)
speechpy_fbanks = speechpy.feature.filterbanks(
    n_filter, coefficients, sampling_freq=samplerate, low_freq=low_freq, high_freq=high_freq)
print("Speechpy mel fbanks shape:{}".format(speechpy_fbanks.shape))

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 10))

x = np.array(list(range(speechpy_fbanks.shape[1])))
x = x * (samplerate / (n_fft + 1))

for i in range(librosa_fbanks.shape[0]):
    axes[0].plot(x, librosa_fbanks[i])
axes[0].set_title("librosa mel fbanks")

for i in range(psf_fbanks.shape[0]):
    axes[1].plot(x, psf_fbanks[i])
axes[1].set_title("psf mel fbanks")

for i in range(speechpy_fbanks.shape[0]):
    axes[2].plot(x, speechpy_fbanks[i])
axes[2].set_title("speechpy mel fbanks")

plt.show()

image

As shown in the figure, the parameter setting of low_freq of filterbanks of speechpy is invalid, and the filterbanks only covers half of the frequency band.

The first problem is caused by

low_freq = low_freq or 300.

When low_freq is 0, low_freq or 300 will return 300 instead of 0.

The second problem is a calculation error.

freq_index = (
    np.floor(
        (coefficients +
         1) *
        hertz /
        sampling_freq)).astype(int)

coefficients is equal to fftpoints/2 +1, which cannot cover the complete frequency band. We should use fftpoints instead of coefficients for calculation.

As shown in my code,I have fixed the above two bugs and hope to get your review and merge. Thank you!

@arfon
Copy link
Contributor

arfon commented Nov 5, 2021

@arfon

???

@yorange1
Copy link
Author

yorange1 commented Nov 5, 2021

@arfon

???

sorry, I made a mistake.

@yorange1
Copy link
Author

Hope to get your review, thank you very much!@astorfi

@Alex-EEE
Copy link

@yorange1 I came across your fix b/c I just noticed the same issue myself. Been trying to get ahold of @astorfi via his emails that I can find, and on here. No response.

I'm thinking of starting a PEP 541 to take over the speechpy pip package. It lets you take over a package if the admin is MIA.

Would you be interested in being co-admin with me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants