Add sample_rate conversion option to `sox_io_backend.load` #816

mthrok · 2020-07-23T15:07:57Z

This PR adds sample_rate option to sox_io_backend.load function which performs resampling when loading audio data from file.

Inspired by this comment

On Python interface this change is backward compatible, but Torchscript registration schema is changed, (see register.cpp), therefore previously dumped Torchscript object that includes call to sox_io_backend.load will stop working. -> BC breaking.
~~Added backward compatibility. TODO: test -> #838~~

vincentqb · 2020-08-07T22:23:12Z

torchaudio/csrc/sox_io.cpp


  std::vector<std::vector<std::string>> effects;
+  if (sample_rate != -1) {
+    effects.emplace_back(
+        std::vector<std::string>{"rate", std::to_string(sample_rate)});


So this is adding to the sox_effects chain a resampling transform, right? I feel the user could want many other transform, and that would add many flags. Could it work instead to point out that sox_effects_chain can combine those operations already? (as you point out here)

The underlying implementation is same (which is why we can add sample rate conversion to load function), but the intended use of frontend functions are different. If users want to use filtering they can certainly do so by using sox effects. This is for convenience, and not for compatibility. Changing sampling rate is such a popular use case, like using the same dataset to train ASR models for different environments (realtime vs batch or on-device vs server etc...). So having convenient and fast way to change sampling rate is valuable.

Users also have torchaudio.transforms.Resample :) Are any of these two (Resample and sox_effects) measurably slower than this implementation?

I'm not keen on adding parameters like these to load functions when there are already simple ways of doing so. On top of adding parameters to the signature, this also does add layers of backward-compatibility etc to the codebase.

How about other backends like soundfile?

Users also have torchaudio.transforms.Resample :) Are any of these two (Resample and sox_effects) measurably slower than this implementation?

The difference is 120x

script

import time import torchaudio torchaudio.set_audio_backend('sox_io') n_rep = 100 sr = 8000 t0 = time.monotonic() for _ in range(n_rep): torchaudio.load('test/torchaudio_unittest/assets/steam-train-whistle-daniel_simon.wav', sample_rate=sr) t1 = time.monotonic() print((t1-t0) / 100) t0 = time.monotonic() for _ in range(n_rep): wave, sample_rate = torchaudio.load('test/torchaudio_unittest/assets/steam-train-whistle-daniel_simon.wav') torchaudio.compliance.kaldi.resample_waveform(wave, sample_rate, sr) t1 = time.monotonic() print((t1-t0) / 100)

$ python foo.py 0.015911198877729474 0.18906223367899655

I'm not keen on adding parameters like these to load functions when there are already simple ways of doing so. On top of adding parameters to the signature, this also does add layers of backward-compatibility etc to the codebase.

How about other backends like soundfile?

This is BC compatible in both Python and C++ Torchscript schema. For other backend, we can add the argument and raise an error saying sample_rate is not supported in this backend until we actually add the implementation. Which is the same approach as the coming backend clean up on "sox_io" and "soundfile" backends.

Another reason why it is a good idea to add sampling rate option to loading function is when extracting a portion of audio file. Like in #882 there are occasions where users want to load only a portion of audio, based on timestamps. Having the correct sampling rate is critical in this case to calculate the target frame positions, and this addition will guarantee that users get the sampling rate they requested even if the underlying audio file has a different sample rate.

In my experience with ASR trainings, the most common mistakes are wrong sampling rate and wrong number of channels. When training models for different purposes, it would really beneficial to load audio while changing sampling rate and extracting a portion like one will do with STM files, without pre-converting audio files and storing on the file system.

One more consideration: could someone want to use a specific algorithm for resampling, instead of the default one? How would the behavior of the resampling be guaranteed across different backends?

When training models for different purposes, it would really beneficial to load audio while changing sampling rate and extracting a portion like one will do with STM files, without pre-converting audio files and storing on the file system.

We can already do that with in-memory sox effects, right?

One more consideration: could someone want to use a specific algorithm for resampling, instead of the default one? How would the behavior of the resampling be guaranteed across different backends?

I do not think it is necessary to make that guarantee into a specification, because each backend has different limitations. If we make that guarantee into specification (except loading the data as is), we limit ourselves to only take the smallest subset of functionalities of all the backends.
Also, so far backends are for the purpose of platform-portability, and it is not meant to be swapped very often for the same installation.
If users can use the Resample transform for that purpose. I think, leveraging the backend's specific capability to provide more efficient way of perform it has added value.

We can already do that with in-memory sox effects, right?

Yes, but this way it's more efficient when one wants to trim the audio, because sox will discard the un-needed portion.

vincentqb · 2020-08-25T16:57:45Z

torchaudio/csrc/sox_io.h

 c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal> load_audio_file(
    const std::string& path,
    const int64_t frame_offset = 0,
    const int64_t num_frames = -1,
    const bool normalize = true,
    const bool channels_first = true);

+// ver. 1 sample_rate is added
+c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal> load_audio_file_v1(


cc @mruberry -- in case you would like to comment on how to deprecate torchscript APIs

It looks like you're adding a new argument with a default value. In that case you should be able to load older serialized Torchscript without issue, and I don't think you need to the old signature.

@mruberry It seems not the case. I removed the BC code and BC test fails. https://app.circleci.com/pipelines/github/pytorch/audio/3270/workflows/0f877fa3-6256-4ef5-8d84-124684d17a97/jobs/93108

@mruberry It seems not the case. I removed the BC code and BC test fails. https://app.circleci.com/pipelines/github/pytorch/audio/3270/workflows/0f877fa3-6256-4ef5-8d84-124684d17a97/jobs/93108

Sorry, @mthrok, I feel like I misled you but that's what I was told. Empirically it seems to not be the case and I'll have to go back and investigate. I suppose you can keep the old signature and remap it into the new one like you originally were if you want to continue being able to load older serialized Torchscript.

Sorry, @mthrok, I feel like I misled you but that's what I was told. Empirically it seems to not be the case and I'll have to go back and investigate. I suppose you can keep the old signature and remap it into the new one like you originally were if you want to continue being able to load older serialized Torchscript.

Do you have an estimate of when it would work without code for backward-compatibility?
I am wondering if I should wait if it's soon. That way we can avoid adding a new schema.

vincentqb · 2020-08-28T18:12:15Z

torchaudio/csrc/sox_io.h

@@ -28,7 +28,8 @@ c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal> load_audio_file(
    const int64_t frame_offset = 0,
    const int64_t num_frames = -1,
    const bool normalize = true,
-    const bool channels_first = true);
+    const bool channels_first = true,
+    const int64_t sample_rate = -1);


Where is the behavior of the -1 value documented?

It is omitted because I judged that the code is self-explanatory.

This reverts commit abbb52e.

* Update dcgan_faces_tutorial.py Previous comment might have been slightly misleading. A bit easier to understand, if made explicit, that gradients of errD_real and errD_fake w.r.t. parameters of netD get added up/accumulated because of the successive backward calls without a zero_grad() in between. * Update dcgan_faces_tutorial.py Co-authored-by: Brian Johnson <[email protected]> Co-authored-by: holly1238 <[email protected]>

mthrok changed the title ~~[BC Breaking] Add sample_rate conversion option to sox_io_backend.load~~ Add sample_rate conversion option to sox_io_backend.load Jul 23, 2020

mthrok force-pushed the load-sample-rate branch 5 times, most recently from 5b91caf to 66499d4 Compare July 30, 2020 19:06

mthrok mentioned this pull request Jul 30, 2020

Add test to validate torchscript backward compatibility #838

Merged

mthrok force-pushed the load-sample-rate branch 12 times, most recently from 898213f to 4b767ee Compare August 6, 2020 18:48

mthrok force-pushed the load-sample-rate branch from 4b767ee to 8a0b921 Compare August 7, 2020 19:15

mthrok marked this pull request as ready for review August 7, 2020 19:45

mthrok requested a review from vincentqb August 7, 2020 19:46

vincentqb reviewed Aug 7, 2020

View reviewed changes

mthrok force-pushed the load-sample-rate branch from 8a0b921 to 5cf9bf9 Compare August 11, 2020 15:15

mthrok force-pushed the load-sample-rate branch from 5cf9bf9 to caf15aa Compare August 19, 2020 19:44

vincentqb reviewed Aug 25, 2020

View reviewed changes

mthrok force-pushed the load-sample-rate branch from caf15aa to abbb52e Compare August 28, 2020 17:18

vincentqb reviewed Aug 28, 2020

View reviewed changes

mthrok added 2 commits August 31, 2020 17:38

Add sample_rate option to sox_io_backend.load

566414c

Remove BC code

47aee13

Revert "Remove BC code"

1e722d8

This reverts commit abbb52e.

mthrok force-pushed the load-sample-rate branch from f324375 to 1e722d8 Compare August 31, 2020 17:39

mthrok mentioned this pull request Sep 6, 2020

Compose sox_effects and torchaudio.transforms #889

Closed

facebook-github-bot added the CLA Signed label Oct 30, 2020

mthrok closed this Nov 3, 2020

mthrok mentioned this pull request Jun 29, 2023

torchaudio.load to optionally accept a target sample_rate (and maybe backend=) #2586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sample_rate conversion option to `sox_io_backend.load` #816

Add sample_rate conversion option to `sox_io_backend.load` #816

mthrok commented Jul 23, 2020 •

edited

Loading

vincentqb Aug 7, 2020 •

edited

Loading

mthrok Aug 14, 2020

vincentqb Aug 17, 2020

mthrok Aug 17, 2020 •

edited

Loading

mthrok Aug 21, 2020

vincentqb Aug 28, 2020 •

edited

Loading

mthrok Aug 28, 2020

vincentqb Aug 25, 2020

mruberry Aug 26, 2020

mthrok Aug 28, 2020

mruberry Aug 28, 2020

mthrok Aug 28, 2020

vincentqb Aug 28, 2020 •

edited

Loading

mthrok Aug 28, 2020

Add sample_rate conversion option to sox_io_backend.load #816

Add sample_rate conversion option to sox_io_backend.load #816

Conversation

mthrok commented Jul 23, 2020 • edited Loading

vincentqb Aug 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mthrok Aug 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentqb Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentqb Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add sample_rate conversion option to `sox_io_backend.load` #816

Add sample_rate conversion option to `sox_io_backend.load` #816

mthrok commented Jul 23, 2020 •

edited

Loading

vincentqb Aug 7, 2020 •

edited

Loading

mthrok Aug 17, 2020 •

edited

Loading

vincentqb Aug 28, 2020 •

edited

Loading

vincentqb Aug 28, 2020 •

edited

Loading