Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass pre-computed info to torchaudio.load() for file-like objects #1442

Closed
hbredin opened this issue Apr 8, 2021 · 4 comments
Closed

Pass pre-computed info to torchaudio.load() for file-like objects #1442

hbredin opened this issue Apr 8, 2021 · 4 comments

Comments

@hbredin
Copy link

hbredin commented Apr 8, 2021

Motivation

Support for file-like object is a great addition to the library. Thanks!
However, if one needs to call both info and load in a row, one must rewind the object between both calls:

with open('audio.wav', 'rb') as fp:
   torchaudio.info(fp)
   fp.seek(0)
   torchaudio.load(fp)

Withoutp fp.seek(0), we get this error:

torchaudio/backend/sox_io_backend.py in load(filepath, frame_offset, num_frames, normalize, channels_first, format)
    145     if not torch.jit.is_scripting():
    146         if hasattr(filepath, 'read'):
--> 147             return torchaudio._torchaudio.load_audio_fileobj(
    148                 filepath, frame_offset, num_frames, normalize, channels_first, format)
    149         filepath = os.fspath(filepath)

RuntimeError: Error loading audio file: failed to open file.

I presume this happens because torchaudio.info consumes the first few bytes of the file to read the header and torchaudio.load does not know that...

Pitch

Would be nice to be able to provide torchaudio.load with the output of info so that it somehow knows that it has already passed the header section of the file.

with open('audio.wav', 'rb') as fp:
   info = torchaudio.info(fp)
   torchaudio.load(fp, info=info)

This would also allow multiple subsequent calls to torchaudio.load, which might come handy if we want to process input file as a stream...

cc @mogwai

@mthrok
Copy link
Collaborator

mthrok commented Apr 8, 2021

Hi @hbredin

Thanks for the pitch. I like the idea but let me give some technical difficulties to achieve this.

  1. info function could consume more than minimum required for reading the header file.
    In info function, there are cases where byte string is consumed more than a header data. This is because we fetch data that are enough to detect audio format so as to start reading the header info. Some formats have smaller header than the size of pre-fetch and the remaining data will be discarded. Therefore even if all the metadata that internal implementation requires are provided, load function cannot necessarily function collectedly.
  2. For soundfile backend, we do not have a control over the law level detail as we just use pysoundfile.
    I could be wrong but can you do it with bare soundfile? Let me know if so. I am curious to see how it can be done.

Having said that, since I worked on this file-like object support, I also am curious to add streaming capability.
My current thought is that we should provide as separate API for streaming and probably bind ffmpeg or something. The natural choice of API will be iterable, which at the time of construction returns the necessary metadata, returns waveform chunk in iteration. That way, the internal state can be preserved over multiple function calls.

# NOTE: This is just an illustration.
stream = torchaudio.load_stream(src, frames_per_chunk, sample_rate, ...)
print(stream.format)
print(stream.bits_per_sample)
for chunk in stream:
    ...

What is your need of streaming? I want to do something about streaming in torchaudio, but at the moment, we do not have an immediate need or request. So if you can fill-in your need and what you would like to achieve in RFC #1072, we can better spec out a new API.

@hbredin
Copy link
Author

hbredin commented Apr 12, 2021

Thanks a lot for the very detailed answer. I now understand why this is tricky. I just merged a PR in pyannote.audio that simply seeks back to 0 whenever needed. This is kind of ugly (it won't work for unseekable streams) and sub-optimal (because the header will be read again and again) but it does the trick.

Regarding streaming, this is something that I would like to support in pyannote.audio eventually (e.g. for live online speaker diarization). The illustrative API above looks like something I could use. Will try to contribute to the mentioned RFC.

Closing this issue. Feel free to re-open but I consider my original issue solved (by the aforementioned PR).

@mthrok
Copy link
Collaborator

mthrok commented Feb 8, 2022

Hi @hbredin

I added a new, prototype streaming API, which can support the use case described here.
If you have time, can you try it out and let us know your thoughts?

https://pytorch.org/audio/main/tutorials/streaming_api_tutorial.html

@hbredin
Copy link
Author

hbredin commented Feb 11, 2022

Thanks @mthrok -- will look into it... at some point (but cannot promise any ETA).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants