torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

vadimkantorov · 2022-07-27T23:53:25Z

🚀 The feature

E.g. OPUS format supports resampling as part of reading. There is no standard and uniform way of setting sample rate at decoding.

E.g. sox sets it always as 48khz: https://github.com/dmkrepo/libsox/blob/master/src/opus.c#L114 (unofficial repo)
while original opusdec itself tries to first copy it from original source sample rate stored in stream header: https://github.com/xiph/opus-tools/blob/master/src/opusdec.c#L897

Fixing sox to do what opusdec does probably should be a feature request to sox and to ffmpeg. But probably torchaudio should support passing some forced sample_rate and built-in resampling if decoder supports it

It may also be a good idea to directly accept a backend= argument as well. This would avoid maintaining it as a global variable and eliminate the need for dataloader worker init code for setting the backend. (Personally, I would even think that the global variable should be phased out in favor of an explicit argument with a default argument)

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

vadimkantorov · 2022-10-12T12:06:44Z

It seems to me that libsndfile (underlying soundfile) also does not support passing a custom sample rate to configure the decoder (e.g. OPUS decoder). So to support this feature (primarily with OPUS or similar codecs that do resampling to target sample rate as part of decoding process), probably custom bindings to libopus (e.g. as in https://github.com/jlaine/opuslib/blob/master/opuslib/api/decoder.py) would be useful (if no per-frame calls are needed and a shared library is assumed, probably even ctypes-bindings would do as via this link)

Created a question about this in libsndfile: libsndfile/libsndfile#886

vadimkantorov · 2022-10-19T12:15:52Z

also, unclear if currently global backend selection should be done in worker_init_fn...

vadimkantorov · 2022-10-19T19:51:01Z

It also seems that pysoundfile has issues of reading opus, so the problem of correct resampling is a bit more pressing: bastibe/python-soundfile#252. Is torchaudio using pysoundfile or libsndfile directly?

vadimkantorov · 2023-06-27T22:59:44Z

It also appears that ffmepg doesn't let the user to directly downsample to the target sample_rate during opus decoding: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c#L65 - it always sets sample_rate to 48khz. Ideally, we should be able to directly set it to the required sample_rate (and be able to read an original sample_rate from the header if the target sample_rate is unset). So maybe it might make sense to directly link torchaudio to libopus to support this regime?

Also, builtin ffmpeg decoder seems to have severe perf problems when resampling is needed: https://video.stackexchange.com/questions/36610/opus-decoding-in-ffmpeg-how-to-pass-target-sample-rate-and-ensure-libopus-decod

mthrok · 2023-06-28T20:46:11Z

Hi - Just wanted to let you know that I read the messages, but I don't have the time to properly craft the reply to all the details.

Regarding the resampling, looking at their CLI code they seem to use FFT-based downsampling. I am not an expert here, but from https://signalsprocessed.blogspot.com/2016/08/audio-resampling-in-python.html, this downsampling method is considered unsuitable to general audio processing.

vadimkantorov · 2023-06-28T20:54:47Z

I don't think opus_compare program is relevant to this. Is it? I think it's just some test utility, and the relevant bits are found in decoding codebase

My question on opus is on letting the user directly setting target sample_rate of the opus decoder structure

And question on general torchaudio.load API is that is that it should accept sample_rate for either passing directly to the decoder (as in opus case or some other decoders which support it) or for doing resampling inside torchaudio.load if it's specified.

It's very common need to force (by resampling if needed) some sample_rate out of the audio loading function...

mthrok · 2023-06-28T21:21:00Z

I don't think opus_compare program is relevant to this.

Would you point the part about the direct downsampling within libopus library you are talking about? Source or CLI help message or whatever. Decoders are generally only responsible for decoding, and resampling should not be part of it. If opus does, it's something special and I first need to understand what it is.

vadimkantorov · 2023-06-28T21:56:41Z

Yes, opus is special about this. By default it decodes to 48khz (which is what ffmpeg bindings do) or whatever sample_rate stored in the opus file header - which is what opusdec does), but actually it can decode to any sample rate at decoding time directly:

Here's how ffmpeg asks for 48khz: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c#L65

And similar code of opusdec does the same (e.g. see --rate option of opusdec but actually one should check the code, but essentially it just sets the .sample_rate field of the decoder structure

vadimkantorov · 2023-06-28T21:59:21Z

Also note that libopus is extremely easy to interface with as demonstrated by ffmpeg bindings above or by https://github.com/jlaine/opuslib/blob/master/opuslib/api/decoder.py and opus is quite well-spread now and the library is compact, so might make sense to compile against it directly as well

mthrok · 2023-06-29T00:03:55Z

Yes, opus is special about this. By default it decodes to 48khz (which is what ffmpeg bindings do) or whatever sample_rate stored in the opus file header - which is what opusdec does), but actually it can decode to any sample rate at decoding time directly:

Here's how ffmpeg asks for 48khz: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/libopusdec.c#L65

I already knew this. My ask is where in the libopus does the resampling happening? I am asking this because

And similar code of opusdec does the same (e.g. see --rate option of opusdec

If the resampling is implemented on opusdec CLI, then binding libopus won't help.

essentially it just sets the .sample_rate field of the decoder structure

And this sounds more like overriding than resampling.

mthrok · 2023-06-29T00:47:52Z

Okay, reading through libopusdec code it seems that the following function indeed does decoding and resampling.

https://github.com/xiph/opus/blob/9fc8fc4cf432640f284113ba502ee027268b0d9f/src/opus_decoder.c#L751

However, the structure of opus_decode_native function looks strange. It recursively calls the same function over the buffer with different frame size parameter. This structure and the fact that resampling only works for division of 48k Hz suggests that so called downsampling is actually decimation, which explains why it is fast.

vadimkantorov · 2023-06-29T01:32:23Z

Also, there're some mentions of speex resampler in opusdec.c: https://github.com/xiph/opus-tools/blob/master/src/opusdec.c#L1157 ...

mthrok · 2023-06-29T01:39:15Z

Also, there're some mentions of speex resampler in opusdec.c: https://github.com/xiph/opus-tools/blob/master/src/opusdec.c#L1157 ...

Reading https://hydrogenaud.io/index.php/topic,113655.0.html, it does not seem speex resampler is contributing to the performance you are looking for.

Also see https://trac.ffmpeg.org/ticket/5240 for why FFmpeg does not recover the original sample rate, and such decision makes sense.

mthrok · 2023-06-29T01:43:20Z

overall, I don't think it's worthwhile for torchaudio to bind libopus. It does not seem to overweight the cost. (please know that nowadays I am almost maintaining alone alongside all the other works)

And if we are to add, I would add switch in torchaudio.load to see if the source is OPUS and branch out to the specific code execution path, however the project you pointed has already wrapped libopus, so you can already do that outside of torchaudio.load. And torchaudio is not aiming to be the fastest decoding library, so I recommend you to simply use the said library for OPUS.

vadimkantorov · 2023-06-29T08:02:51Z

Hmm. Overall yes, it seems that opus does resampling using speex resampler if not 48khz is required at decoding.

So in the context of torchaudio I would say:

If decoding is done by chunks, it can be important to downsample a block right away without loading the whole 48khz decoded PCM in memory. 48khz PCM can eat a lot of RAM especially for huge multi-hour files, 8khz or 48khz can make a difference for these memory-wise.
If torchaudio supports reading opus via ffmpeg, it should have some perf tests to test builtin and libopus decoders to guard against bugs like this: https://video.stackexchange.com/questions/36610/opus-decoding-in-ffmpeg-how-to-pass-target-sample-rate-and-ensure-libopus-decod and in general to compare against opusdec, as giant speech datasets are commonly stored in opus these days
It still might make sense to accept target sample_rate as part of torchaudio.load API as resampling to a single target sample rate is often needed by the users
In general, I don't know what resampler is torchaudio using and if speex C code from libopus can be a useful code
I don't know if anyone needs ffmpeg-less build of torchaudio, but in this case directly building against libopus/libflac might make sense as they are quite small and self-container libraries in terms of code footprint.
about ffmpeg not restoring the original sample rate: I think it's actually not a very good thing and as you pointed out it's mostly about fitting ffmpeg architecture. e.g. if torchaudio could first load the original sample rate from opus header and then call ffmpeg / resample to it, it would fulfill a feature a user would expect

mthrok · 2023-06-29T16:15:16Z

It still might make sense to accept target sample_rate as part of torchaudio.load API as resampling to a single target sample rate is often needed by the users

This one, I totally agree. I actually tried to add this but got a unreasonable push back so could not do it. #816

I am okay with bringing this back.

If decoding is done by chunks, it can be important to downsample a block right away without loading the whole 48khz decoded PCM in memory. 48khz PCM can eat a lot of RAM especially for huge multi-hour files, 8khz or 48khz can make a difference for these memory-wise.

In general, I don't know what resampler is torchaudio using and if speex C code from libopus can be a useful code

FFmpeg and sox have own resampling implementations which work on streaming fashion, but soundfile does not. So on adding target sample rate, the consistency of the functionality is an issue. (well we can start by saying soundfile backend does not support this)

about ffmpeg not restoring the original sample rate: I think it's actually not a very good thing and as you pointed out it's mostly about fitting ffmpeg architecture. e.g. if torchaudio could first load the original sample rate from opus header and then call ffmpeg / resample to it, it would fulfill a feature a user would expect

If torchaudio supports reading opus via ffmpeg, it should have some perf tests to test builtin and libopus decoders to guard against bugs like this: https://video.stackexchange.com/questions/36610/opus-decoding-in-ffmpeg-how-to-pass-target-sample-rate-and-ensure-libopus-decod and in general to compare against opusdec, as giant speech datasets are commonly stored in opus these days

Reading the standard, the treatment of rate is vague. https://datatracker.ietf.org/doc/html/rfc7845.html#section-5.1 At least, I see why FFmpeg always resorts to 48k Hz even if it might feel strange to users. (I also thought it was strange at first, still do but I also get how FFmpeg developers think about it.) So I don't think it's a bug, yet indeed everything becomes 48k Hz is surprising and it is agaisnt least-surprise principle of software. but at the same time, libopus is only a reference implementation, so we don't need to stick to its extra behaviors not defined in standard.

I don't know if anyone needs ffmpeg-less build of torchaudio, but in this case directly building against libopus/libflac might make sense as they are quite small and self-container libraries in terms of code footprint.

I see your point, but For OPUS, I think one can workaround by using the Python wrapper you referred. When you know that all the audios in your dataset are OPUS, there should be no problem using. I hear that conversion from NumPy NDArray to Torch Tensor is quite fast.

vadimkantorov · 2023-06-29T16:51:14Z

About adding optional target sample_rate for torchaudio.load: I would say it's okay to add these kinds of high-level improvements for user convenience.

If users are interested in a particular backend, they can use it directly (btw pysoundfile still does not support opus unless some patches are made). And yes, of course one can directly use ctypes libopus wrappers, but it's just less convenient and more boiler-plate.

For soundfile, torchaudio could use its own builtin resampler to downsample. Currently most often one has to do this kind of boilerplate postproc anyway.

vadimkantorov changed the title ~~torchaudio.load to optionally accept sample_rate~~ torchaudio.load to optionally accept sample_rate (and maybe backend=) Jul 28, 2022

mthrok added module: IO needs triage labels Jul 29, 2022

nateanl assigned hwangjeff Aug 2, 2022

nateanl added triaged and removed needs triage labels Aug 2, 2022

vadimkantorov mentioned this issue Oct 12, 2022

Pass user-chosen sample rate to the decoder while audio decoding (usecase: OPUS decoding) and resample/error-out if the decoder does not support this option libsndfile/libsndfile#886

Open

vadimkantorov changed the title ~~torchaudio.load to optionally accept sample_rate (and maybe backend=)~~ torchaudio.load to optionally accept a target sample_rate (and maybe backend=) Oct 19, 2022

yairl mentioned this issue Feb 15, 2024

Perform in-place resampling during read_audio. snakers4/silero-vad#421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

vadimkantorov commented Jul 27, 2022 •

edited

Loading

vadimkantorov commented Oct 12, 2022 •

edited

Loading

vadimkantorov commented Oct 19, 2022

vadimkantorov commented Oct 19, 2022 •

edited

Loading

vadimkantorov commented Jun 27, 2023 •

edited

Loading

mthrok commented Jun 28, 2023

vadimkantorov commented Jun 28, 2023

mthrok commented Jun 28, 2023

vadimkantorov commented Jun 28, 2023 •

edited

Loading

vadimkantorov commented Jun 28, 2023

mthrok commented Jun 29, 2023

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023

mthrok commented Jun 29, 2023

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023 •

edited

Loading

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023

torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

Comments

vadimkantorov commented Jul 27, 2022 • edited Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

vadimkantorov commented Oct 12, 2022 • edited Loading

vadimkantorov commented Oct 19, 2022

vadimkantorov commented Oct 19, 2022 • edited Loading

vadimkantorov commented Jun 27, 2023 • edited Loading

mthrok commented Jun 28, 2023

vadimkantorov commented Jun 28, 2023

mthrok commented Jun 28, 2023

vadimkantorov commented Jun 28, 2023 • edited Loading

vadimkantorov commented Jun 28, 2023

mthrok commented Jun 29, 2023

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023

mthrok commented Jun 29, 2023

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023 • edited Loading

mthrok commented Jun 29, 2023

vadimkantorov commented Jun 29, 2023

vadimkantorov commented Jul 27, 2022 •

edited

Loading

vadimkantorov commented Oct 12, 2022 •

edited

Loading

vadimkantorov commented Oct 19, 2022 •

edited

Loading

vadimkantorov commented Jun 27, 2023 •

edited

Loading

vadimkantorov commented Jun 28, 2023 •

edited

Loading

vadimkantorov commented Jun 29, 2023 •

edited

Loading