-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample_rate conversion option to sox_io_backend.load
#816
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,13 +23,23 @@ struct SignalInfo : torch::CustomClassHolder { | |
|
||
c10::intrusive_ptr<SignalInfo> get_info(const std::string& path); | ||
|
||
// ver. 0 | ||
c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal> load_audio_file( | ||
const std::string& path, | ||
const int64_t frame_offset = 0, | ||
const int64_t num_frames = -1, | ||
const bool normalize = true, | ||
const bool channels_first = true); | ||
|
||
// ver. 1 sample_rate is added | ||
c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal> load_audio_file_v1( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @mruberry -- in case you would like to comment on how to deprecate torchscript APIs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like you're adding a new argument with a default value. In that case you should be able to load older serialized Torchscript without issue, and I don't think you need to the old signature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mruberry It seems not the case. I removed the BC code and BC test fails. https://app.circleci.com/pipelines/github/pytorch/audio/3270/workflows/0f877fa3-6256-4ef5-8d84-124684d17a97/jobs/93108 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry, @mthrok, I feel like I misled you but that's what I was told. Empirically it seems to not be the case and I'll have to go back and investigate. I suppose you can keep the old signature and remap it into the new one like you originally were if you want to continue being able to load older serialized Torchscript. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you have an estimate of when it would work without code for backward-compatibility? |
||
const std::string& path, | ||
const int64_t frame_offset = 0, | ||
const int64_t num_frames = -1, | ||
const bool normalize = true, | ||
const bool channels_first = true, | ||
const int64_t sample_rate = -1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is the behavior of the -1 value documented? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is omitted because I judged that the code is self-explanatory. |
||
|
||
void save_audio_file( | ||
const std::string& file_name, | ||
const c10::intrusive_ptr<torchaudio::sox_utils::TensorSignal>& signal, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is adding to the sox_effects chain a resampling transform, right? I feel the user could want many other transform, and that would add many flags. Could it work instead to point out that sox_effects_chain can combine those operations already? (as you point out here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying implementation is same (which is why we can add sample rate conversion to load function), but the intended use of frontend functions are different. If users want to use filtering they can certainly do so by using sox effects. This is for convenience, and not for compatibility. Changing sampling rate is such a popular use case, like using the same dataset to train ASR models for different environments (realtime vs batch or on-device vs server etc...). So having convenient and fast way to change sampling rate is valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users also have
torchaudio.transforms.Resample
:) Are any of these two (Resample
andsox_effects
) measurably slower than this implementation?I'm not keen on adding parameters like these to load functions when there are already simple ways of doing so. On top of adding parameters to the signature, this also does add layers of backward-compatibility etc to the codebase.
How about other backends like soundfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is 120x
script
This is BC compatible in both Python and C++ Torchscript schema. For other backend, we can add the argument and raise an error saying
sample_rate is not supported in this backend
until we actually add the implementation. Which is the same approach as the coming backend clean up on "sox_io" and "soundfile" backends.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason why it is a good idea to add sampling rate option to loading function is when extracting a portion of audio file. Like in #882 there are occasions where users want to load only a portion of audio, based on timestamps. Having the correct sampling rate is critical in this case to calculate the target frame positions, and this addition will guarantee that users get the sampling rate they requested even if the underlying audio file has a different sample rate.
In my experience with ASR trainings, the most common mistakes are wrong sampling rate and wrong number of channels. When training models for different purposes, it would really beneficial to load audio while changing sampling rate and extracting a portion like one will do with STM files, without pre-converting audio files and storing on the file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more consideration: could someone want to use a specific algorithm for resampling, instead of the default one? How would the behavior of the resampling be guaranteed across different backends?
We can already do that with in-memory sox effects, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is necessary to make that guarantee into a specification, because each backend has different limitations. If we make that guarantee into specification (except loading the data as is), we limit ourselves to only take the smallest subset of functionalities of all the backends.
Also, so far backends are for the purpose of platform-portability, and it is not meant to be swapped very often for the same installation.
If users can use the Resample transform for that purpose. I think, leveraging the backend's specific capability to provide more efficient way of perform it has added value.
Yes, but this way it's more efficient when one wants to trim the audio, because sox will discard the un-needed portion.