You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the purpose of the visibility, better communication with the community and getting feedback from you, I am sharing the projects I am working on for this half of 2021.
The next release of PyTorch / torchaudio is scheduled around the first week of March. Things landed on master branch before the middle of February could go into this release.
I/O
Backend update
Continuing from the previous release, I am working on refining the I/O modules. The main tracker is #903. Other important feedback I am tracking is ✅ #1094.
✅ In-memory encoding/decoding support
The main tracker is #1115. See the description of #1108 for the examples of this feature.
Prototyping More portable / performant loading.
We are considering getting rid of the notion of “backend”. It is forcing the code base to be stateful and still there is discrepancy in the number of codecs supported between Windows and Nix OSs.
In #1000, I am planning to add a loading function that performs a format-based backend dispatch mechanism, with the choice of portable decoding libraries. At some point, I am thinking to redirect torchaudio.load function to point this function, but that is not in planned at all.
Libtorchaudio (CMake-based build)
As we promote the use of TorchScript for better “research to production” experience, we would like to expand the use of torchaudio to outside of Python, like C++ env and mobile. For this, we need to be able to build the C++ extension outside of Python.
✅ PR Switch to cmake for build #1068, is the starting point which allows to build libtorchaudio extension without Python. This PR is almost complete except I do not know how to fix the conda packaging stuck issue for macOS. (any help/suggestion is appreciated)
With the support of in-memory decoding/encoding #1115, we can apply codecs to audio tensors directly. I am thinking to make codec-based augmentation feature, and would like ask community to help bringing this into torchaudio
✅ Public hearing
In RFC: Applying codecs as data augmentation #1146 I am asking for opinions on codec-based augmentation.
At the moment, it looks like that the variety of the codec is an important aspect of this feature. I am mostly settled to bind ffmpeg, but this is an extended goal.
✅ API design
✅ Implementation
Once the API is finalized, I plan to create an issue, asking for help in its implementation.
Extension
Once we land the API, we can start thinking about ffmpeg integration.
Planning for supporting native Complex Tensor
Historically, torchaudio mocked complex tensors with extra dimensions which contain the real part and imaginary part (calling this as pseudo complex type, in the following). Now PyTorch is adding native complex dtypes and we would like to adopt it.
It will be greatly appreciated if you can give feedback on the usage of Complex tensors. Torchaudio is in the middle of PyTorch, where the Complex Tensor is implemented and you, the downstream users, so correcting the feedback and passing it to the PyTorch core is very important for us,
Once we figure it out, we start working on drafting, adding extra tests (JIT, gradient, distributed, nn.Module adoption etc).
Streaming support
This is a very early-stage project. Many audio applications work in real-time, but so far there is little (if any) out-of-box support for that in PyTorch or torchaudio. We are currently collecting opinions to define the problem we can work on for better streaming support. It could be I/O, performance, caching, example application, specific model or anything. If you have any opinion, please comment #1072.
WFST-based ASR model
Last year, new libraries such as GTN and K2 were announced. It will be interesting to have an example based on WFST models.
The text was updated successfully, but these errors were encountered:
For the purpose of the visibility, better communication with the community and getting feedback from you, I am sharing the projects I am working on for this half of 2021.
The next release of PyTorch / torchaudio is scheduled around the first week of March. Things landed on master branch before the middle of February could go into this release.
I/O
Backend update
Continuing from the previous release, I am working on refining the I/O modules. The main tracker is #903. Other important feedback I am tracking is ✅ #1094.
✅ In-memory encoding/decoding support
The main tracker is #1115. See the description of #1108 for the examples of this feature.
Prototyping More portable / performant loading.
We are considering getting rid of the notion of “backend”. It is forcing the code base to be stateful and still there is discrepancy in the number of codecs supported between Windows and Nix OSs.
In #1000, I am planning to add a loading function that performs a format-based backend dispatch mechanism, with the choice of portable decoding libraries. At some point, I am thinking to redirect
torchaudio.load
function to point this function, but that is not in planned at all.Libtorchaudio (CMake-based build)
As we promote the use of TorchScript for better “research to production” experience, we would like to expand the use of torchaudio to outside of Python, like C++ env and mobile. For this, we need to be able to build the C++ extension outside of Python.
Augmentation
With the support of in-memory decoding/encoding #1115, we can apply codecs to audio tensors directly. I am thinking to make codec-based augmentation feature, and would like ask community to help bringing this into torchaudio
In RFC: Applying codecs as data augmentation #1146 I am asking for opinions on codec-based augmentation.
At the moment, it looks like that the variety of the codec is an important aspect of this feature. I am mostly settled to bind ffmpeg, but this is an extended goal.
Once the API is finalized, I plan to create an issue, asking for help in its implementation.
Once we land the API, we can start thinking about
ffmpeg
integration.Planning for supporting native Complex Tensor
Historically, torchaudio mocked complex tensors with extra dimensions which contain the real part and imaginary part (calling this as pseudo complex type, in the following). Now PyTorch is adding native complex dtypes and we would like to adopt it.
It will be greatly appreciated if you can give feedback on the usage of Complex tensors. Torchaudio is in the middle of PyTorch, where the Complex Tensor is implemented and you, the downstream users, so correcting the feedback and passing it to the PyTorch core is very important for us,
Streaming support
This is a very early-stage project. Many audio applications work in real-time, but so far there is little (if any) out-of-box support for that in PyTorch or torchaudio. We are currently collecting opinions to define the problem we can work on for better streaming support. It could be I/O, performance, caching, example application, specific model or anything. If you have any opinion, please comment #1072.
WFST-based ASR model
Last year, new libraries such as GTN and K2 were announced. It will be interesting to have an example based on WFST models.
The text was updated successfully, but these errors were encountered: