Pass pre-computed `info` to `torchaudio.load()` for file-like objects #1442

hbredin · 2021-04-08T11:59:21Z

Motivation

Support for file-like object is a great addition to the library. Thanks!
However, if one needs to call both info and load in a row, one must rewind the object between both calls:

with open('audio.wav', 'rb') as fp:
   torchaudio.info(fp)
   fp.seek(0)
   torchaudio.load(fp)

Withoutp fp.seek(0), we get this error:

torchaudio/backend/sox_io_backend.py in load(filepath, frame_offset, num_frames, normalize, channels_first, format)
    145     if not torch.jit.is_scripting():
    146         if hasattr(filepath, 'read'):
--> 147             return torchaudio._torchaudio.load_audio_fileobj(
    148                 filepath, frame_offset, num_frames, normalize, channels_first, format)
    149         filepath = os.fspath(filepath)

RuntimeError: Error loading audio file: failed to open file.

I presume this happens because torchaudio.info consumes the first few bytes of the file to read the header and torchaudio.load does not know that...

Pitch

Would be nice to be able to provide torchaudio.load with the output of info so that it somehow knows that it has already passed the header section of the file.

with open('audio.wav', 'rb') as fp:
   info = torchaudio.info(fp)
   torchaudio.load(fp, info=info)

This would also allow multiple subsequent calls to torchaudio.load, which might come handy if we want to process input file as a stream...

cc @mogwai

The text was updated successfully, but these errors were encountered:

mthrok · 2021-04-08T14:14:00Z

Hi @hbredin

Thanks for the pitch. I like the idea but let me give some technical difficulties to achieve this.

info function could consume more than minimum required for reading the header file.
In info function, there are cases where byte string is consumed more than a header data. This is because we fetch data that are enough to detect audio format so as to start reading the header info. Some formats have smaller header than the size of pre-fetch and the remaining data will be discarded. Therefore even if all the metadata that internal implementation requires are provided, load function cannot necessarily function collectedly.
For soundfile backend, we do not have a control over the law level detail as we just use pysoundfile.
I could be wrong but can you do it with bare soundfile? Let me know if so. I am curious to see how it can be done.

Having said that, since I worked on this file-like object support, I also am curious to add streaming capability.
My current thought is that we should provide as separate API for streaming and probably bind ffmpeg or something. The natural choice of API will be iterable, which at the time of construction returns the necessary metadata, returns waveform chunk in iteration. That way, the internal state can be preserved over multiple function calls.

# NOTE: This is just an illustration.
stream = torchaudio.load_stream(src, frames_per_chunk, sample_rate, ...)
print(stream.format)
print(stream.bits_per_sample)
for chunk in stream:
    ...

What is your need of streaming? I want to do something about streaming in torchaudio, but at the moment, we do not have an immediate need or request. So if you can fill-in your need and what you would like to achieve in RFC #1072, we can better spec out a new API.

hbredin · 2021-04-12T11:29:23Z

Thanks a lot for the very detailed answer. I now understand why this is tricky. I just merged a PR in pyannote.audio that simply seeks back to 0 whenever needed. This is kind of ugly (it won't work for unseekable streams) and sub-optimal (because the header will be read again and again) but it does the trick.

Regarding streaming, this is something that I would like to support in pyannote.audio eventually (e.g. for live online speaker diarization). The illustrative API above looks like something I could use. Will try to contribute to the mentioned RFC.

Closing this issue. Feel free to re-open but I consider my original issue solved (by the aforementioned PR).

mthrok · 2022-02-08T08:50:20Z

Hi @hbredin

I added a new, prototype streaming API, which can support the use case described here.
If you have time, can you try it out and let us know your thoughts?

https://pytorch.org/audio/main/tutorials/streaming_api_tutorial.html

hbredin · 2022-02-11T08:00:16Z

Thanks @mthrok -- will look into it... at some point (but cannot promise any ETA).

hbredin mentioned this issue Apr 8, 2021

feat: Support for file like objects pyannote/pyannote-audio#640

Merged

hbredin closed this as completed Apr 12, 2021

hbredin mentioned this issue Apr 12, 2021

RFC: Streaming Inference / Application #1072

Closed

hbredin mentioned this issue Mar 7, 2022

Investigate torchaudio streaming API pyannote/pyannote-audio#914

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass pre-computed `info` to `torchaudio.load()` for file-like objects #1442

Pass pre-computed `info` to `torchaudio.load()` for file-like objects #1442

hbredin commented Apr 8, 2021

mthrok commented Apr 8, 2021

hbredin commented Apr 12, 2021

mthrok commented Feb 8, 2022

hbredin commented Feb 11, 2022

Pass pre-computed info to torchaudio.load() for file-like objects #1442

Pass pre-computed info to torchaudio.load() for file-like objects #1442

Comments

hbredin commented Apr 8, 2021

Motivation

Pitch

mthrok commented Apr 8, 2021

hbredin commented Apr 12, 2021

mthrok commented Feb 8, 2022

hbredin commented Feb 11, 2022

Pass pre-computed `info` to `torchaudio.load()` for file-like objects #1442

Pass pre-computed `info` to `torchaudio.load()` for file-like objects #1442