-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more efficient resample module #908
Comments
Thanks for the suggestion and the code. This is very cool, and I am very happy to replacing the existing resampling code which we are having difficult time testing. I looked at the test code of the implementation and it seems it's parity with |
Ooh, I agree, it would be smart to check that librosa and the new resampler are giving the same results. I will check that out. My understanding is that filter=hann and num_zeros=6 should reproduce the behaviour of the current torchaudio resample module, but I haven't actually made sure of that either. Let me report back. :-) I also saw another thread where there was a discussion about whether the input and output sample rates should be floats or ints - that should probably be considered for this module as well. |
I've updated the notebook to make a comparison of the signals generated by the existing torchaudio resample operation and the efficient torch implementation.... and I have to admit that I'm not totally sure what level of agreement I'm expecting to see.... On a signal with an rms of 0.0088, the mean abs difference between the resampled signals is ~1e-5 (kaiser_best) or ~5e-5 (torchaudio settings). On the positive side, the agreement is better for the comparison of librosa and efficient torch with the 'kaiser_best' settings than it is for the comparison of torchaudio and efficient torch (which should be the case because the kernel for 'kaiser_best' has dim=1+64 * 2 whereas the kernel for torchaudio has dim=1+6 * 2). Any further thoughts? @mthrok I'm mostly paranoid that there is some parameter that I think is the same as in the torchaudio method but which is actually not.... Can you point me to the existing tests that you want to run for the resampler? |
I do not have a strong opinion as to reflect the outcome of the discussion (wait till the discussion concludes), and one way to do is just add this new resampling and change sampling rate in a separate work. Or we can just bet that
Cool, let's figure out what we can do together ;)
Don't worry, as long as the implementation is reasonable, we can find a place to settle down.
The existing test looks like this, test and it's data. The current resampling follows the C++ implementation of Kaldi code, which is not exposed as stand alone, thus it's hard to generate reference input/output pattern.
I think if your implementation matches with librosa, that's enough and it does not need to be comparable against the current torchaudio. We typicall use PyTorch's I would like to try your implementation, would you like to put it in a python format (not notebook)? If you are familiar with how git and PR work, then you can create a PR, where we can discuss the detail of the implementation. If you are not and do not have time, then let me know, I can port your code. |
@small-yellow-duck -- This is exciting! Thanks for opening this discussion :)
I expect the discussion around float/integer to have minimal impact on this pull request. My expectations is that the new algorithm also only works for integer -- is that correct? If so, then let's just continue assuming integers. We currently assume integers everywhere. If and when we decide to move to float, we'll have to do migration work all over the place, so we don't really need to worry about this right now. For this particular algorithm, if we ever move to float, we might decide to standardize around taking the integer part of the float, as mentioned by @mthrok. We'll cross that bridge when we get there :) |
Made a PR: #911 I think I should be able to get better agreement with the librosa resample function, but librosa relies on the resample routine in the resampy package which has some pre-generated filters (and no way to know for sure how they were generated - see https://github.com/bmcfee/resampy/blob/ccb85575403663e17697bcde8e25765b022a9e0f/resampy/filters.py#L6 ) Maybe it's better to try to aim to get agreement with the resample routine in scipy? |
We merged #1087, which makes the resampling faster. The original implementation was mostly verbatim translate of Kaldi's C++ source code, which performs a lot of element-wise Tensor access with indexing (like |
Replace RRef with RemoteModule
The resampling function in Kaldi that pytorchaudio is currently using has some inefficient for loops and padding steps. I've put together efficient module and evaluated the performance in this notebook:
https://www.kaggle.com/smallyellowduck/fast-audio-resampling-layer-in-pytorch (code is in the the notebook)
edit to make two separate comparisons of the resampling time without the file load time:
Comparison 1: 'kaiser_best' settings in librosa vs 'kaiser_best' setting in the efficient pytorch resampler (should be the same setup)
Comparison 2: default setting in torchaudio vs window='hann', num_zeros=6 in the efficient pytorch resampler (should be the same set-up)
The performance improvement is most substantial when the input sample rate and output sample rate are not whole number multiple of each other.
I think it would be good for torchaudio to switch to the more efficient resample module.
Before making a PR, perhaps other people have feedback about what the API for the module should look like? I have largely tried to follow the api for the resample method in librosa. Any other additional comments? @vincentqb
The text was updated successfully, but these errors were encountered: