Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing My Projects 2021 H1 #1154

Closed
mthrok opened this issue Jan 5, 2021 · 0 comments
Closed

Sharing My Projects 2021 H1 #1154

mthrok opened this issue Jan 5, 2021 · 0 comments

Comments

@mthrok
Copy link
Collaborator

mthrok commented Jan 5, 2021

For the purpose of the visibility, better communication with the community and getting feedback from you, I am sharing the projects I am working on for this half of 2021.

The next release of PyTorch / torchaudio is scheduled around the first week of March. Things landed on master branch before the middle of February could go into this release.

I/O

Backend update

Continuing from the previous release, I am working on refining the I/O modules. The main tracker is #903. Other important feedback I am tracking is ✅ #1094.

✅ In-memory encoding/decoding support

The main tracker is #1115. See the description of #1108 for the examples of this feature.

Prototyping More portable / performant loading.

We are considering getting rid of the notion of “backend”. It is forcing the code base to be stateful and still there is discrepancy in the number of codecs supported between Windows and Nix OSs.
In #1000, I am planning to add a loading function that performs a format-based backend dispatch mechanism, with the choice of portable decoding libraries. At some point, I am thinking to redirect torchaudio.load function to point this function, but that is not in planned at all.

Libtorchaudio (CMake-based build)

As we promote the use of TorchScript for better “research to production” experience, we would like to expand the use of torchaudio to outside of Python, like C++ env and mobile. For this, we need to be able to build the C++ extension outside of Python.

  • ✅ PR Switch to cmake for build #1068, is the starting point which allows to build libtorchaudio extension without Python. This PR is almost complete except I do not know how to fix the conda packaging stuck issue for macOS. (any help/suggestion is appreciated)
  • After Switch to cmake for build #1068, we can add examples to run torchaudio in C++, like the prototype here.

Augmentation

With the support of in-memory decoding/encoding #1115, we can apply codecs to audio tensors directly. I am thinking to make codec-based augmentation feature, and would like ask community to help bringing this into torchaudio

  1. ✅ Public hearing
    In RFC: Applying codecs as data augmentation #1146 I am asking for opinions on codec-based augmentation.
    At the moment, it looks like that the variety of the codec is an important aspect of this feature. I am mostly settled to bind ffmpeg, but this is an extended goal.
  2. ✅ API design
  3. ✅ Implementation
    Once the API is finalized, I plan to create an issue, asking for help in its implementation.
  4. Extension
    Once we land the API, we can start thinking about ffmpeg integration.

Planning for supporting native Complex Tensor

Historically, torchaudio mocked complex tensors with extra dimensions which contain the real part and imaginary part (calling this as pseudo complex type, in the following). Now PyTorch is adding native complex dtypes and we would like to adopt it.
It will be greatly appreciated if you can give feedback on the usage of Complex tensors. Torchaudio is in the middle of PyTorch, where the Complex Tensor is implemented and you, the downstream users, so correcting the feedback and passing it to the PyTorch core is very important for us,

  • The initial action item is to figure out the release schedule and what/how we add support for native complex type and migrating away from pseudo complex type. -> [Migration] Torchaudio Complex Tensor Support and Migration #1337
  • Once we figure it out, we start working on drafting, adding extra tests (JIT, gradient, distributed, nn.Module adoption etc).

Streaming support

This is a very early-stage project. Many audio applications work in real-time, but so far there is little (if any) out-of-box support for that in PyTorch or torchaudio. We are currently collecting opinions to define the problem we can work on for better streaming support. It could be I/O, performance, caching, example application, specific model or anything. If you have any opinion, please comment #1072.

WFST-based ASR model

Last year, new libraries such as GTN and K2 were announced. It will be interesting to have an example based on WFST models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant