Fixed some bugs in mel filterbanks. #36

yorange1 · 2021-11-05T05:09:01Z

I wrote some code to compare the mel filterbanks in librosa, python speech feature and speechpy, and found two problems.

1. The initialization of the band edge of the Mel filterbanks may be wrong.
1. The calculation to convert frequency to fft bin number is wrong.

import matplotlib.pyplot as plt
import numpy as np
import librosa
import python_speech_features as psf
import speechpy

n_fft = 256        # The number of FFT components
n_filter = 20      # The number of filters in the filterbank
samplerate = 16000 # The samplerate of the signal
low_freq = 0       # The lowest band edge of the filters
high_freq = 8000   # The highest band edge of the filters

librosa_fbanks = librosa.filters.mel(
    sr=samplerate, n_fft=n_fft, n_mels=n_filter, fmin=low_freq, fmax=high_freq, norm=None)
print("Librosa mel fbanks shape:{}".format(librosa_fbanks.shape))

psf_fbanks = psf.base.get_filterbanks(
    nfilt=n_filter, nfft=n_fft, samplerate=samplerate, lowfreq=low_freq, highfreq=high_freq)
print("PSF mel fbanks shape:{}".format(psf_fbanks.shape))

coefficients = int(n_fft/2 + 1)
speechpy_fbanks = speechpy.feature.filterbanks(
    n_filter, coefficients, sampling_freq=samplerate, low_freq=low_freq, high_freq=high_freq)
print("Speechpy mel fbanks shape:{}".format(speechpy_fbanks.shape))

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 10))

x = np.array(list(range(speechpy_fbanks.shape[1])))
x = x * (samplerate / (n_fft + 1))

for i in range(librosa_fbanks.shape[0]):
    axes[0].plot(x, librosa_fbanks[i])
axes[0].set_title("librosa mel fbanks")

for i in range(psf_fbanks.shape[0]):
    axes[1].plot(x, psf_fbanks[i])
axes[1].set_title("psf mel fbanks")

for i in range(speechpy_fbanks.shape[0]):
    axes[2].plot(x, speechpy_fbanks[i])
axes[2].set_title("speechpy mel fbanks")

plt.show()

As shown in the figure, the parameter setting of low_freq of filterbanks of speechpy is invalid, and the filterbanks only covers half of the frequency band.

The first problem is caused by

low_freq = low_freq or 300.

When low_freq is 0, low_freq or 300 will return 300 instead of 0.

The second problem is a calculation error.

freq_index = (
    np.floor(
        (coefficients +
         1) *
        hertz /
        sampling_freq)).astype(int)

coefficients is equal to fftpoints/2 +1, which cannot cover the complete frequency band. We should use fftpoints instead of coefficients for calculation.

As shown in my code，I have fixed the above two bugs and hope to get your review and merge. Thank you!

arfon · 2021-11-05T10:32:07Z

@arfon

???

yorange1 · 2021-11-05T10:46:05Z

@arfon

???

sorry, I made a mistake.

yorange1 · 2021-11-12T16:29:57Z

Hope to get your review, thank you very much！@astorfi

Alex-EEE · 2023-01-21T00:00:16Z

@yorange1 I came across your fix b/c I just noticed the same issue myself. Been trying to get ahold of @astorfi via his emails that I can find, and on here. No response.

I'm thinking of starting a PEP 541 to take over the speechpy pip package. It lets you take over a package if the admin is MIA.

Would you be interested in being co-admin with me?

yorange1 added 2 commits November 5, 2021 11:20

Fix mel filterbanks freq_index compute error.

af0e45a

Fix mel filterbanks freq band edge initialization error.

02a748e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed some bugs in mel filterbanks. #36

Fixed some bugs in mel filterbanks. #36

yorange1 commented Nov 5, 2021

arfon commented Nov 5, 2021

yorange1 commented Nov 5, 2021

yorange1 commented Nov 12, 2021

Alex-EEE commented Jan 21, 2023

Fixed some bugs in mel filterbanks. #36

Are you sure you want to change the base?

Fixed some bugs in mel filterbanks. #36

Conversation

yorange1 commented Nov 5, 2021

arfon commented Nov 5, 2021

yorange1 commented Nov 5, 2021

yorange1 commented Nov 12, 2021

Alex-EEE commented Jan 21, 2023