Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more efficient resample module #908

Closed
small-yellow-duck opened this issue Sep 14, 2020 · 8 comments
Closed

more efficient resample module #908

small-yellow-duck opened this issue Sep 14, 2020 · 8 comments

Comments

@small-yellow-duck
Copy link

small-yellow-duck commented Sep 14, 2020

The resampling function in Kaldi that pytorchaudio is currently using has some inefficient for loops and padding steps. I've put together efficient module and evaluated the performance in this notebook:
https://www.kaggle.com/smallyellowduck/fast-audio-resampling-layer-in-pytorch (code is in the the notebook)

edit to make two separate comparisons of the resampling time without the file load time:
Comparison 1: 'kaiser_best' settings in librosa vs 'kaiser_best' setting in the efficient pytorch resampler (should be the same setup)

librosa: 51 s
efficient pytorch resampler: 9 s

Comparison 2: default setting in torchaudio vs window='hann', num_zeros=6 in the efficient pytorch resampler (should be the same set-up)

torchaudio: 10 s
efficient pytorch resampler: 1 s

The performance improvement is most substantial when the input sample rate and output sample rate are not whole number multiple of each other.

I think it would be good for torchaudio to switch to the more efficient resample module.

Before making a PR, perhaps other people have feedback about what the API for the module should look like? I have largely tried to follow the api for the resample method in librosa. Any other additional comments? @vincentqb

    def __init__(self,
                 input_sr, output_sr, dtype,
                 num_zeros = 64, cutoff_ratio = 0.95, filter='kaiser', beta=14.0):
        super().__init__()  # init the base class
        """
        This creates an object that can apply a symmetric FIR filter
        based on torch.nn.functional.conv1d.

        Args:
          input_sr:  The input sampling rate, AS AN INTEGER..
          output_sr:  The output sampling rate, AS AN INTEGER.
          dtype:  The torch dtype to use for computations
          num_zeros: The number of zeros per side in the (sinc*hanning-window)
              filter function.  More is more accurate, but 64 is already quite a lot. 
          cutoff_ratio: The filter rolloff point as a fraction of the Nyquist freq.
          filter: one of ['kaiser', 'kaiser_best', 'kaiser_fast', 'hann']
          beta: parameter for 'kaiser' filter
@mthrok
Copy link
Collaborator

mthrok commented Sep 14, 2020

Hi @small-yellow-duck

Thanks for the suggestion and the code. This is very cool, and I am very happy to replacing the existing resampling code which we are having difficult time testing.

I looked at the test code of the implementation and it seems it's parity with librosa. Do the additional filter types also give the same results as librosa? (This is a question out of curiosity, not the requirement for adding it to torchaudio.)

@small-yellow-duck
Copy link
Author

small-yellow-duck commented Sep 14, 2020

Ooh, I agree, it would be smart to check that librosa and the new resampler are giving the same results. I will check that out.

My understanding is that filter=hann and num_zeros=6 should reproduce the behaviour of the current torchaudio resample module, but I haven't actually made sure of that either. Let me report back. :-)

I also saw another thread where there was a discussion about whether the input and output sample rates should be floats or ints - that should probably be considered for this module as well.

@small-yellow-duck
Copy link
Author

small-yellow-duck commented Sep 15, 2020

I've updated the notebook to make a comparison of the signals generated by the existing torchaudio resample operation and the efficient torch implementation.... and I have to admit that I'm not totally sure what level of agreement I'm expecting to see....

On a signal with an rms of 0.0088, the mean abs difference between the resampled signals is ~1e-5 (kaiser_best) or ~5e-5 (torchaudio settings). On the positive side, the agreement is better for the comparison of librosa and efficient torch with the 'kaiser_best' settings than it is for the comparison of torchaudio and efficient torch (which should be the case because the kernel for 'kaiser_best' has dim=1+64 * 2 whereas the kernel for torchaudio has dim=1+6 * 2).

Any further thoughts? @mthrok I'm mostly paranoid that there is some parameter that I think is the same as in the torchaudio method but which is actually not....

Can you point me to the existing tests that you want to run for the resampler?

@mthrok
Copy link
Collaborator

mthrok commented Sep 15, 2020

I also saw another thread where there was a discussion about whether the input and output sample rates should be floats or ints - that should probably be considered for this module as well.

I do not have a strong opinion as to reflect the outcome of the discussion (wait till the discussion concludes), and one way to do is just add this new resampling and change sampling rate in a separate work. Or we can just bet that float type wins and use float from the beginning. However, I saw in the original implementation requires integer type as sample rate. Can your implementation handle floating type, natively or just have to cast to int internally? If the algorithm itself is restricted to integer type sample rate, I think it's okay to cast float type sampling rate to integer type internally, as long as that's is stated in docstring.

I've updated the notebook to make a comparison of the signals generated by the existing torchaudio resample operation and the efficient torch implementation.... and I have to admit that I'm not totally sure what level of agreement I'm expecting to see....

Cool, let's figure out what we can do together ;)

I'm mostly paranoid that there is some parameter that I think is the same as in the torchaudio method but which is actually not....

Don't worry, as long as the implementation is reasonable, we can find a place to settle down.

Can you point me to the existing tests that you want to run for the resampler?

The existing test looks like this, test and it's data. The current resampling follows the C++ implementation of Kaldi code, which is not exposed as stand alone, thus it's hard to generate reference input/output pattern.
The test relies on the file dumped so it is not extensible (to other sample rates), and the code to generate these files are lost. This is why I would love to replace the code with something more testable, and I thought if the new implementation is comparable with an existing framework (like librosa), that will make it so easy to ensure that the function is correct.

On a signal with an rmse of 0.0088, the mean abs difference between the resampled signals is ~1e-5 (kaiser_best) or ~5e-5 (torchaudio settings). On the positive side, the agreement is better for the comparison of librosa and efficient torch with the 'kaiser_best' settings than it is for the comparison of torchaudio and efficient torch (which should be the case because the kernel for 'kaiser_best' has dim=1+64 * 2 whereas the kernel for torchaudio has dim=1+6 * 2).

I think if your implementation matches with librosa, that's enough and it does not need to be comparable against the current torchaudio. We typicall use PyTorch's testEqual, which works similar way as NumPy's assert_allclose (it compares two Tensors and check if the maximum difference is within absolute difference or relative difference). 1e-5 looks okay to me.

I would like to try your implementation, would you like to put it in a python format (not notebook)? If you are familiar with how git and PR work, then you can create a PR, where we can discuss the detail of the implementation. If you are not and do not have time, then let me know, I can port your code.

@vincentqb
Copy link
Contributor

@small-yellow-duck -- This is exciting! Thanks for opening this discussion :)

I also saw another thread where there was a discussion about whether the input and output sample rates should be floats or ints - that should probably be considered for this module as well.

I do not have a strong opinion as to reflect the outcome of the discussion (wait till the discussion concludes), and one way to do is just add this new resampling and change sampling rate in a separate work. Or we can just bet that float type wins and use float from the beginning. However, I saw in the original implementation requires integer type as sample rate. Can your implementation handle floating type, natively or just have to cast to int internally? If the algorithm itself is restricted to integer type sample rate, I think it's okay to cast float type sampling rate to integer type internally, as long as that's is stated in docstring.

I expect the discussion around float/integer to have minimal impact on this pull request. My expectations is that the new algorithm also only works for integer -- is that correct? If so, then let's just continue assuming integers.

We currently assume integers everywhere. If and when we decide to move to float, we'll have to do migration work all over the place, so we don't really need to worry about this right now. For this particular algorithm, if we ever move to float, we might decide to standardize around taking the integer part of the float, as mentioned by @mthrok. We'll cross that bridge when we get there :)

@small-yellow-duck
Copy link
Author

Made a PR: #911

I think I should be able to get better agreement with the librosa resample function, but librosa relies on the resample routine in the resampy package which has some pre-generated filters (and no way to know for sure how they were generated - see https://github.com/bmcfee/resampy/blob/ccb85575403663e17697bcde8e25765b022a9e0f/resampy/filters.py#L6 )

Maybe it's better to try to aim to get agreement with the resample routine in scipy?

@mthrok
Copy link
Collaborator

mthrok commented Jan 19, 2021

Hi @small-yellow-duck

We merged #1087, which makes the resampling faster. The original implementation was mostly verbatim translate of Kaldi's C++ source code, which performs a lot of element-wise Tensor access with indexing (like tensor[i]). This was not a good fit with how PyTorch works. So #1087 vectorized the internals and simplified the resampling operation to a convolution operation. This is very similar to the suggestion discussed here.

@mthrok
Copy link
Collaborator

mthrok commented Aug 3, 2021

Closing as this is addressed in #1487.
If you would like to provide feedback, please comment in #1487.

@mthrok mthrok closed this as completed Aug 3, 2021
mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants