Sharing My Projects 2021 H1 #1154

mthrok · 2021-01-05T18:36:49Z

For the purpose of the visibility, better communication with the community and getting feedback from you, I am sharing the projects I am working on for this half of 2021.

The next release of PyTorch / torchaudio is scheduled around the first week of March. Things landed on master branch before the middle of February could go into this release.

I/O

Backend update

Continuing from the previous release, I am working on refining the I/O modules. The main tracker is #903. Other important feedback I am tracking is ✅ #1094.

✅ In-memory encoding/decoding support

The main tracker is #1115. See the description of #1108 for the examples of this feature.

Prototyping More portable / performant loading.

We are considering getting rid of the notion of “backend”. It is forcing the code base to be stateful and still there is discrepancy in the number of codecs supported between Windows and Nix OSs.
In #1000, I am planning to add a loading function that performs a format-based backend dispatch mechanism, with the choice of portable decoding libraries. At some point, I am thinking to redirect torchaudio.load function to point this function, but that is not in planned at all.

Libtorchaudio (CMake-based build)

As we promote the use of TorchScript for better “research to production” experience, we would like to expand the use of torchaudio to outside of Python, like C++ env and mobile. For this, we need to be able to build the C++ extension outside of Python.

✅ PR Switch to cmake for build #1068, is the starting point which allows to build libtorchaudio extension without Python. This PR is almost complete except I do not know how to fix the conda packaging stuck issue for macOS. (any help/suggestion is appreciated)
After Switch to cmake for build #1068, we can add examples to run torchaudio in C++, like the prototype here.

Augmentation

With the support of in-memory decoding/encoding #1115, we can apply codecs to audio tensors directly. I am thinking to make codec-based augmentation feature, and would like ask community to help bringing this into torchaudio

✅ Public hearing
In RFC: Applying codecs as data augmentation #1146 I am asking for opinions on codec-based augmentation.
At the moment, it looks like that the variety of the codec is an important aspect of this feature. I am mostly settled to bind ffmpeg, but this is an extended goal.
✅ API design
✅ Implementation
Once the API is finalized, I plan to create an issue, asking for help in its implementation.
Extension
Once we land the API, we can start thinking about ffmpeg integration.

Planning for supporting native Complex Tensor

Historically, torchaudio mocked complex tensors with extra dimensions which contain the real part and imaginary part (calling this as pseudo complex type, in the following). Now PyTorch is adding native complex dtypes and we would like to adopt it.
It will be greatly appreciated if you can give feedback on the usage of Complex tensors. Torchaudio is in the middle of PyTorch, where the Complex Tensor is implemented and you, the downstream users, so correcting the feedback and passing it to the PyTorch core is very important for us,

The initial action item is to figure out the release schedule and what/how we add support for native complex type and migrating away from pseudo complex type. -> [Migration] Torchaudio Complex Tensor Support and Migration #1337
Once we figure it out, we start working on drafting, adding extra tests (JIT, gradient, distributed, nn.Module adoption etc).

Streaming support

This is a very early-stage project. Many audio applications work in real-time, but so far there is little (if any) out-of-box support for that in PyTorch or torchaudio. We are currently collecting opinions to define the problem we can work on for better streaming support. It could be I/O, performance, caching, example application, specific model or anything. If you have any opinion, please comment #1072.

WFST-based ASR model

Last year, new libraries such as GTN and K2 were announced. It will be interesting to have an example based on WFST models.

The text was updated successfully, but these errors were encountered:

vincentqb mentioned this issue Jan 25, 2021

Roadmap ahead for torchaudio #1196

Closed

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021

Fix typo "asynchronizely" -> "asynchronously" (pytorch#1154)

f1e682e

mthrok closed this as completed Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing My Projects 2021 H1 #1154

Sharing My Projects 2021 H1 #1154

mthrok commented Jan 5, 2021 •

edited

Loading

Sharing My Projects 2021 H1 #1154

Sharing My Projects 2021 H1 #1154

Comments

mthrok commented Jan 5, 2021 • edited Loading

I/O

Backend update

✅ In-memory encoding/decoding support

Prototyping More portable / performant loading.

Libtorchaudio (CMake-based build)

Augmentation

Planning for supporting native Complex Tensor

Streaming support

WFST-based ASR model

mthrok commented Jan 5, 2021 •

edited

Loading