-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/mobile audio #13432
Comments
CL https://golang.org/cl/17262 mentions this issue. |
It looks great! How about adding
I think we should, any kind of audio processing would more than likely done in float values, using float right away seems to be a better move IMHO (not a deal breaker tho).
We can implement the decoders in pure Go code for when codecs are not available. I have partially implemented such decoders and while they might not be as fast as dedicated/optimized codecs, I think they should work well enough. |
L117 small typo, |
Question, why would we have a generic |
I created a quick and dirty but conform aiff decoder to get a feel for the API: https://github.com/mattetti/exp/blob/master/audio/aiff/decoder.go I think it might be missing an API to get frames out of a clip. The current proposal goes all the way to get the clip and frame info but we can't get the actual PCM data (which would probably some sort of What do you think? |
Thanks for the feedback!
The AudioFormat would represent the coding format of the original source, not the Clip's. The decoders won't work with Clips but arbitrary encoded streams of bytes and convert them into a Clip. The decoder can return the audio format of the original source. Consider the following decoder function:
This is unfortunately coming with a performance cost. We need to find a way to support float values but also allow integer-only values for faster processing for those who doesn't care about float-level precision.
We won't have enough human resources probably to reimplement decoders in Go, therefore using the available codecs is going to be our initial step. The proposal is not against of a decoder being implemented in vanilla Go though.
It was a sample for decoder implementations, I was not proposing it to be in the audio package. DecodeBytes is a better name in any case, I will update the proposal.
We had an earlier debate about providing APIs that work with frames rather than byte slices. The previous API was in the lines of what's below.
There are two arguments against the frame-based APIs:
|
My only concern is that a frame from a wav file can contain data using different encoding and non PCM formats will definitely not have PCM frames. For instance, if one would be to write a mp3 decoder, would the decoder have to provide a clip with an |
Correct. You can read only and only PCM formatted data from clips. A decoder is responsible to decode any input down to a PCM formatted frames. I am going to update the document to make it clearer. |
Updates golang/go#13432. Change-Id: I718006e8f039c476d456c1276c54132bd66d9410 Reviewed-on: https://go-review.googlesource.com/17262 Reviewed-by: Burcu Dogan <[email protected]>
Some work has already been done in this domain, by the audio package of Azul3d. It is intended to define interfaces analogous to the Perhaps some ideas could be stolen from its design. @slimsag is the original developer, and he may have some insight into problems encountered when designing the interface, and how they were solved by Azul3d.
There exist at least two FLAC decoders in pure Go. The second FLAC package is conceptually a back-end for decoding FLAC audio files, for which one or more front-end may be implemented. Currently a front-end has been implemented for the Azul3d audio abstraction, and it should be trivial to implement another front-end for the Go mobile audio abstraction. Analogously to the import _ "azul3d.org/audio/flac.v0" in a program's main package. Below follows a short example program for decoding FLAC audio files with Azul3d: package main
import (
"flag"
"fmt"
"log"
"os"
"azul3d.org/audio.v1"
_ "azul3d.org/audio/flac.v0" // Add FLAC decoding support.
)
func main() {
flag.Parse()
for _, path := range flag.Args() {
err := parseFLAC(path)
if err != nil {
log.Fatal(err)
}
}
}
func parseFLAC(path string) error {
// Open file.
f, err := os.Open(path)
if err != nil {
return err
}
defer f.Close()
// Create decoder.
dec, _, err := audio.NewDecoder(f)
if err != nil {
return err
}
// Decode audio stream.
for {
samples := make(audio.PCM32Samples, 1024)
n, err := dec.Read(samples)
if err != nil {
if err == audio.EOS {
return nil
}
return err
}
fmt.Println(samples[:n])
}
} I hope this may give some ideas, and hope improve the API and design of the Go mobile audio package. Cheers /u |
Thanks for the input. The proposal is currently on hold and require more work on a variety of topics:
|
@mattetti and I have been working on audio during GopherCon and will update the proposal soon with our most recent ideas. |
Seeking within the audio source is not always a trivial problem, e.g. the cases where the underlying source is encoded with a VBR compression algorithm, optimistic seeking and slurping the entire data source may be required. Separation of the Seek method allows us to explain the requirements from the implementors more easily. Since different encoders will have different stragies to Seek, it is also easy to document the cost and the underlying algorithm to the users. Also, the seperation helps us document the requirements of implementing an efficient Clip.Frames. We are explecting decoders to have a cursor on the data source and move forward as new Frames calls arrive. We only expect them to modify the cursor if seeking is required. Without a separate Seek method, implementors need to check if the offset is matching with their current internal cursor and seek if it is not. Separation saves them figuring out the conditional case for a requirement of seeking. Change-Id: I0b15d56df9457a462953aaaf915445e268462f97 Reviewed-on: https://go-review.googlesource.com/24702 Reviewed-by: David Crawshaw <[email protected]>
@mewmew we decided to reduce the scope of the proposal to an interface allowing reading and writing of PCM data. We are currently working on various test implementations covering multiple use cases (audio analysis, processing and playback) with a few different formats and platforms. The big challenge is defining an interface that is multi-purpose yet flexible enough to support various kind of codecs. As we started implemented our ideas, we started finding issues/challenges and went back to the drawing board. As soon as we have decent implementations validating our design, we will submit an updated proposal. |
Sounds great! Glad to hear that you are fleshing out these details. I know quite a few Gophers in the community that would be happy to stress test the API of these interfaces. Are the preliminary interfaces available in a public repository by any chance? Edit: To provide some additional background. The audio project of the Azul3D engine would be happy to see where and how these interfaces fit into the "generic" audio package, which is analogous to the image package of the standard Go library. My brother and I would also be interested in looking into how difficult it would be to add support for FLAC decoding, targetting these interfaces. Then there are quite a few music players, audio visualizers and other such software in the community which would like to use the audio API. So you have a few stress testers lined up :) |
On hold for #17244. |
Hi, bit late to the party but I have a small extra data point: the Opus API. (I maintain a wrapper around xiph.org's standard implementations, libopus and libopusfile; https://github.com/hraban/opus. The API mirrors the C API, in Go flavor.) One note I didn't see explicitly covered (sorry if I missed it) is the difference between a wrapper format like Ogg, and an actual encoding like Opus or Vorbis. If I get a .opus file, that's not raw opus data, that's an ogg stream with opus data inside it. However, if I pass a bunch of PCM frames to an Opus encoder, I don't get an opus stream: I get opus bytes. This is particularly relevant if you don't control the source; if something is feeding you raw (unwrapped) bytes of an encoded audio stream you can't pass it to a decoder expecting it to be wrapped in Ogg or whatever. On floats: they're great when you're mixing the audio because they don't clip. You can mix up, down, sideways, and scale back without special checks nor losing data. They have their place, it's nice to have an extra FramesFloat32() option which decodes to float32. The Opus decoder supports this natively. And a final question: is the byte slice returned by |
Update: with the help of @rakyll and @campoy I started working on a more generic audio bugger interface that could be used by codecs, analyzers, effects etc... The code isn't done but the idea would be to create a proposal around the minimal useful interface: https://github.com/go-audio/audio @hraban I'd love to hear your perspective on the |
It's pretty Go-like, but still fairly C-like in places. Ditch the underscore variables, ditch the ALL_CAPS errors and APPLICATION_VOIP etc (https://github.com/golang/go/wiki/CodeReviewComments#mixed-caps), don't have exported variables of unexported type (opusError, opusFileError), use shorter variables names when it's obvious ( |
@bradfitz thanks for the great feedback. I've made the changes to the API. I'll look at the private variables another time. @mattetti I've been thinking about your api and I realised a mistake in my assumptions: PCM doesn't always have to be 16 bit. However, on the flip side, it almost always is. I'm generally very eager to have these codecs be as efficient as possible, because for my specific use case I was trying to handle as many as I could (mixing multiple audio streams, it was cpu bound). This is why I'm very keen on "hot potato" buffer handling, meaning that I wanted to allocate two circular buffers when a client connects (a float32[] for pcm and a byte[] for encoded data), and keep reusing that and passing it to the codec. xiph.org's libopus(file) API allows for this very nicely, and I was never doing any copying or manual converting. If you're going to be making a fundamental audio API for all Go programs to use, I'd definitely urge you to allow implementations to support float pcm and int (of any width) pcm without copying. That said, I'll have to take a closer look another time, and actually try and implement it to see where it gets me. I only know Vorbis and Opus by xiph.org, and it's very clear they learned from their (numerous) mistakes in the former when designing the API for the latter. |
@hraban you are absolutely right that performance is important in a lot of cases, that's why I implemented 2 buffers: an int and a float buffer. You can easily implement a byte buffer and exchange the data between the 2. I used float64 instead of float32 mainly because there is little overhead and because for signal processing, might as well use float64. That said, we could add a float32 buffer if that made sense. The buffers are designed to be reused so we don't have to relocate them. I believe I provided a few examples with the PCM codecs (in most cases, we don't want to load an entire file in memory and prefer to read chunks at a time and dropping the PCM data in a reused buffer which can then be processed before being reused). |
What I mean is you want to feed your buffers to the codec, and tell it to decode directly into them. A float64 buffer won't fly with libopus nor libvorbis†, because they use float32. That means you will be doing at least one conversion loop there ( Of course, libopus and libvorbis are just two libraries written in C, they're not gospel; once you get a pure golang implementation you can do what you want... I'm just not sure that even for a pure golang implementation it would be wise to go float64. I don't think you will ever need that precision, and it comes quite a steep cost: halving performance of everything capped by memory: cache, RAM bandwidth. I'm coming at this from the perspective of a server-side app dealing with a lot of simultaneous audio streams. Think TeamSpeak. For this I can tell you: you want that data in and out asap. and every byte you can squeeze out of a session's footprint is room for another session to cram in. If you're having to convert between your codec's float32 array and an internal float64 array, that's immediately a cost you'd rather forego. Same for int vs int16. For mobile, I have no idea. Never touched mobile dev in my life. † can't mention libvorbis here without clarifying they don't actually support this *pcm buffer passing---they allocate a buffer internally which you have to copy out of before your next frame decode. Still, that's a memcpy, rather than a for loop with type conversion. And if you really wanted, you could have your libvorbis wrapper do the same, using unsafe. If you really really wanted. Hopefully not, but the point is it's nice to leave people the choice. When it comes to audio, every cycle counts. |
@hraban those are fair points, many DAWs and audio plugins do process the audio signal in float64 and Go std library is quite partial to this format. That said I agree that float32 is very common and I added it to the WIP proposal. That change required that I update the interface to: type Buffer interface {
// PCMFormat is the format of buffer (describing the buffer content/format).
PCMFormat() *Format
// NumFrames returns the number of frames contained in the buffer.
NumFrames() int
// AsFloatBuffer returns a float 64 buffer from this buffer.
AsFloatBuffer() *FloatBuffer
// AsFloat32Buffer returns a float 32 buffer from this buffer.
AsFloat32Buffer() *Float32Buffer
// AsIntBuffer returns an int buffer from this buffer.
AsIntBuffer() *IntBuffer
// Clone creates a clean clone that can be modified without
// changing the source buffer.
Clone() Buffer
} Because if an API says it would take a generic // Float32Buffer is an audio buffer with its PCM data formatted as float32.
type Float32Buffer struct {
// Format is the representation of the underlying data format
Format *Format
// Data is the buffer PCM data as floats
Data []float32
} As you can see, the buffer is just a wrapper around a slice. The buffer has a few convenient methods to convert its data but it is designed so you can reuse it as often as you want. For instance you could allocate the float buffer of a certain size and keep decoding into it, send it over to a transform method and get the buffer back to then re-encode it. My goal here is to define a common API that all audio libraries could use. @rakyll and I really quickly realized this was really hard because of the various use cases and concerns involved. But wouldn't be amazing if we could use the same interface and chain all those audio libs together? I do believe this change should address your concern, but please let me know if that's not the case. Suggestions welcome too! |
I like where this is going. There is certainly room for a unified audio
codec api done right. There are many similarities, and if all you care
about is handling some PCM data and just codec'ing at the edges, the
specific codec used should be completely pluggable.
I'll try to implement this interface in the opus lib, see if I run into
anything else. Well, besides PCM int16 :)
Cheers!
…On Fri, Dec 30, 2016 at 5:09 AM, Matt Aimonetti ***@***.***> wrote:
@hraban <https://github.com/hraban> those are fair points, many DAWs and
audio plugins do process the audio signal in float64 and Go std library is
quite partial to this format. That said I agree that float32 is very common
and I added it to the WIP proposal. That change required that I update the
interface to:
type Buffer interface {
// PCMFormat is the format of buffer (describing the buffer content/format).
PCMFormat() *Format
// NumFrames returns the number of frames contained in the buffer.
NumFrames() int
// AsFloatBuffer returns a float 64 buffer from this buffer.
AsFloatBuffer() *FloatBuffer
// AsFloat32Buffer returns a float 32 buffer from this buffer.
AsFloat32Buffer() *Float32Buffer
// AsIntBuffer returns an int buffer from this buffer.
AsIntBuffer() *IntBuffer
// Clone creates a clean clone that can be modified without
// changing the source buffer.
Clone() Buffer
}
Because if an API says it would take a generic audio.Buffer interface, it
might need to convert it to float32. Now an API, like yours might only
accept audio.Float32Buffer to avoid any lookup or potential conversion
code. Here is the definition of the audio.Float32Buffer:
// Float32Buffer is an audio buffer with its PCM data formatted as float32.type Float32Buffer struct {
// Format is the representation of the underlying data format
Format *Format
// Data is the buffer PCM data as floats
Data []float32
}
As you can see, the buffer is just a wrapper around a slice. The buffer
has a few convenient methods to convert its data but it is designed so you
can reuse it as often as you want. For instance you could allocate the
float buffer of a certain size and keep decoding into it, send it over to a
transform method and get the buffer back to then re-encode it.
My goal here is to define a common API that all audio libraries could use.
@rakyll <https://github.com/rakyll> and I really quickly realized this
was really hard because of the various use cases and concerns involved. But
wouldn't be amazing if we could use the same interface and chain all those
audio libs together?
I do believe this change should address your concern, but please let me
know if that's not the case. Suggestions welcome too!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13432 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAIafLnOaWjMdyDaTHELkZPThjsawt-Dks5rNJHtgaJpZM4Grzt2>
.
|
Closing this proposal on the behalf of #18497, let's keep the conversation on the new proposal. |
In the scope of the Go mobile project, an audio package that supports decoding and playback is a top priority. The current status of audio support under x/mobile is limited to OpenAL bindings and an experimental high-level audio player that is backed by OpenAL.
The current experimental audio package fails to
In order to address these concerns, I am proposing core abstractions and a minimal set of features based on the proposed abstractions to provide decoding and playback support.
See the proposal document for further details.
The text was updated successfully, but these errors were encountered: