-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: audio package #18497
Comments
The first here link is dead. |
@nhooyr thanks, fixed. |
To paraphrase someone from the Python community
Why does this need to be added to the state dad library, why can't this be something like github.com/splice/audio |
@davecheney you are right and it is already here: https://github.com/go-audio/audio but here is the thing, if we want to unify audio libraries out there, we need to make the common interface standard and official. Note that this package is not a package providing codecs and audio tools. It's a package used by 3rd party package developers implementing a shared interface. Of course all packages in the std lib could have been external and could still be moved outside but it greatly increases the friction and the chances of consolidation. If you go through the proposal, I hope it's clear that my goal is to provide common ground for all libraries. As you can imagine, if a new developer coming from company X want to implement time stretching in Go, she might start by looking at what's available in the standard library and go from there. If nothing is mentioned, or doesn't do the right google search, she might end up building a solution that wouldn't work well with others. What I am trying to say is that the main value of having this package in the standard library is to have a shared entry point for all libraries. I fully agree that audio libraries implementing codecs, effects and other things should live on their own. That's also why I setup the go-audio organization :) |
But why does it need to live in the standard library? All interface declarations with the same method sets are equal and interchangeable. |
It's more than an interface, there are concrete implementations. |
As I understand it the majority of the standard library exists to support
building Go and the go tool. @bradfitz would be the best person to answer
this, he has done the archology.
From my point of view, adding a package to the standard library is the kiss
of death. They packages API is condemned to grow via accretion in unnatural
ways, which we are seeing with the database/sql package.
Note: I am specifically not taking aim at the work of those adding features
to the database/sql package, simply noting that if that package lived
independently from the stdlib it would have been a straight forward matter
to bump the major revision of the package to add support for the context
arguments added in 1.8.
…On Tue, 3 Jan 2017, 18:38 Matt Aimonetti ***@***.***> wrote:
It's more than an interface, there are concrete implementations.
I'm not sure how to answer your question in a way that would satisfy you,
can you maybe help me understand why the packages we currently have are
part of the standard library? Since anything that doesn't define the
langage itself can be external, there must be reasons that made us choose
to have them as part of the standard library.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18497 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA3ikeXfN3AYl4cCDZbxk7TZJY22Hks5rOfrdgaJpZM4LZXk3>
.
|
I'm worried about the Buffer implementation having to allocate. Maybe there's some way to wire multiple buffers together in a way that doesn't involve allocation, but I'm not sure how? Maybe you have some example of a real-time-audio-synth working on 1ms or smaller buffers? The list of 6 conversion functions should probably work on slices rather than on single values. |
I am underqualified to assess whether this should live as that said, the tangentially, do you think the "multi-dimensional slices" proposal could help reconcile the various
having a "buffer protocol"-like interface would be more stdlib-worthy than |
@egonelbre good idea, I will post examples showing how to reuse the slice. The short version is that you have a slice and you don't have to reallocate. In what context are you concerned? In other words, how are you getting the input data that makes you think such an approach would require reallocation? In regards to the conversion functions, can you say more about your suggestion? |
@sbinet yes matrices would improve things since frames would be nicer to write / read and more performant. In regards to the bytes like buffer protocol I am not quite sure what that means concretely. When doing DSP, you have to convert the bytes into sample values, process them and convert them back into bytes at the end. But you'd want to keep the samples in a processable format until the end. The audio buffer approach is used in CoreAudio, the Cocoa framework for macOS/iOS. The hard part here is to find the balance between performance and convenience. We want to avoid reallocations but we also want a nice API. |
I mainly concerned about real-time-audio. A good example would be something like:
Bonus points for:
Basically, I'm missing the big picture how the Buffer should be used in practice. |
No problem, I have a generator package and effect libraries I developed against a slightly different version of the proposal, I will port them over shortly. I don't have an output to device tho, but I guess output to UI would do it. |
I mean when you are using IntToIeeeFloat, I assume, most of the time you are working on an []int rather than a single int. When people write the for loops manually for a particular conversion it makes more difficult to "upgrade" everyone to optimized versions of the conversions (e.g. asm optimized loops). (Alternatively, I misunderstood the purpose of the conversion functions.) |
So far nothing has been decided for #17244, which would impact where this lives. I suggest reviewing that issue and chiming in there. Outside of determining the import path for the audio packages, I think it would be great to have a set of go to audio packages in the Go community. Also related: #13432 (Also, I really like @rakyll's suggestion of special interest groups). |
I think adding a new experimental package to the std is
a non-starter.
The first step should be adding the proposed package
as x/media/audio.
|
I'll be focusing on the technical details of the proposal and answer the technical concerns that were and will be brought up. The decision of where this package should live is a separate issue, my last comment on the topic is that we have generic packages such as In the meantime, I would very much appreciate feedback on the technical proposal more than on where the package should live. For comments on std lib and x packages, please comment on Andrew's proposal mentioned above. |
x/whatever is just a path. "eXtra" maybe? "x/exp" is experimental. But for the rest, yeah, probably best to keep this related to just "what a standard audio package should look like" regardless of where it lives. |
@mattetti wrt the buffer protocol, I had played a bit with the persistency aspects here: |
I think having interfaces in the standard library has some advantages over defining interfaces in third party packages. People are much more likely to go out of their way use them and maintain compatibility than they are when it's an interface in some third party library. (In fact, unless there's a clear third party market leader in a domain, users are unlikely to even find the interface if it's not part of the standard library. As a Go user, I've implemented image.Image and io.Reader on types, but I've never went out of my way to find other library's interfaces to implement and don't even know where I'd get started..) The idea behind audio.Buffer being in the standard library seems reasonable to me, in order to improve interoperability of packages that do audio processing. That said, I think I don't really know much about about audio processing, but having read your proposal I have some questions:
|
FWIW, the standard library contributors don't have neither bandwidth or technical skills to maintain an audio package. I would wait for #17244 to be resolved to see if x/ might be a good domain. Otherwise, a special workgroup would be the best option for media/audio projects. A standard interface might be good to have if possible for PCM buffers. I am naively believing that a community can converge in a standard interface without contributing it to the standard library though. If the work is owned by a specialist group under Go ecosystem, there will be less chances that it will be limited by the resources of a company or the Go project. Not every idea in the standard library wins the hearts automatically, the sql package is an example. Standard library is the worst place to contribute big interfaces and audio requires one or two big interfaces at least :( Also the burden of having to maintain an API forever once is contributed to the standard library is a big contract. |
@sbinet correct me if I'm wrong but in your ndim implementation you can't reuse the slice since it's not exported which in my case would mean a big overhead when dealing with gigabytes of data. @kardianos good point about the meaning of x, I wrongly associated @driusan I obviously agree and appreciate your support. I leave it to the core team to define where it should live.
I think this is a different issue and one that can be addressed separately (your suggestion for specialist groups is a potential solution). But I agree that this is at the heart of #17244 and it has to be addressed.
He's talking about 'providing an official package for interoperating about (but not necessarily doing) X' which is exactly what this proposal is all about. |
@egonelbre I use |
@egonelbre here is a real time synth example using stdin to change the oscillator frequency and portaudio to output the audio to the soundcard: https://github.com/go-audio/generator/blob/master/examples/realtime/main.go Press keys to change the played note (don't forget to press I believe the float64 to float32 truncation/copy is very cheap and not a performance concern (especially compared to allocating and populating a new slice) but I might be wrong. If that's your concern, would you mind benchmarking that for us? |
Maybe I'm missing something, but I don't see the purpose of carrying around the BitDepth (and to some degree Endianness) on the Format field of the buffers, and particularly the FloatBuffer. Once you've working with the data in floats neither of those is really relevant except when converting back to an integer format again. That seems like it's a separate concern rather than the property of the buffer. |
@kisielk this is a fair critique. The reason those values were added were for codecs so we could round trip without having to check or store metadata anywhere else. I just checked Let me try to play with changing that and see what it breaks and how to work around it. I'll report back in a couple days. Edit: @kisielk after playing with the suggestion, I updated the proposal and implementations. Thanks! Let me know if you're coming to NAMM, I owe you one! |
I only skimmed https://github.com/go-audio/generator/blob/master/examples/realtime/main.go but, AFAICT, it (or its imports) don't actually use the audio.Buffer interface type, only the audio.FloatBuffer struct type. Similarly, the AsFloat32Buffer method is never called anywhere, even though you convert float64 samples to float32. As @egonelbre said, I'm missing the big picture how the proposed standard type (the Buffer interface type) and its methods should be used in practice. |
This is drifting off-topic, but I also left some quick code review comments on go-audio/generator@aed9f58 based on my skim. |
@nigeltao the example I linked to was showing a real time example for @egonelbre that was worried about performance. The proposal is for the interface and the implementations. I will add more examples with decoders and encoders and why the interface is useful. I do however expect that the concrete types might be used often to avoid conversion but the buffer can always be converted if needed. |
@mattetti, @egonelbre can correct me if I'm misunderstanding his concerns, but I think that I have the same concern that the AsXxxBuffer methods have to allocate on every call. You posted an example that doesn't allocate, but also doesn't use the AsXxxBuffer methods, so I don't think it directly addresses the concern, and I'm still a little worried about performance. In any case, you said you'll add more examples later for why the interface is useful. I look forward to those examples (and there's no need to hurry on that). |
@nigeltao It might be interesting to look at the previous iteration where I only had 1 concrete type: https://github.com/mattetti/audio/blob/master/pcm_buffer.go#L43 but talking with @campoy I came to the conclusion that having separate concrete types was a better approach. Maybe the |
Yes, i.e. AsXxx() would only be usable offline processing. I feel like the whole discussion will eventually converge on something like this:
It might even make sense to drop float32 or float64, but I'm not feeling confident to do so at this moment. But, that API is pretty much the minimal you can get away for general-purpose audio processing. |
The issue is unpredictable timing, not necessarily slowness -- although you still want things to be as fast as possible. If you have a lot of things with unpredictable timing it's impossible to ensure that the "bad timings" don't happen together. Which means you start to get intermittent "glitches" and no easy way of debugging what happened. For more see http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing Obviously, at the end of audio-graphs doing some conversion is unavoidable, if you want to support different types. Or you have to do a lot of work to avoid it. |
Commenting on the technical side of the proposal: The names Exposing 16-bit data without copying would be quite important for processing audio input in realtime as some devices do provide only that. Thus having
Is the idea to handle endianess in the libraries and not expose it to the user (e.g. typically samples are native endian, WAV little endian, AIFF big endian)? Looking at the code it converts floating point and integer samples to each other by simply casting them. But floating point samples are between [-1,1] and integer samples (e.g. 16bit) between [-32768, 32767]. Thus the conversions seem quite wrong at first glance. E.g. http://blog.bjornroche.com/2013/05/the-abcs-of-pcm-uncompressed-digital.html has an introduction on uncompressed audio formats for those interested on such topics. |
From the discussions on different forums I gather that only using So, there doesn't seem to be a consensus in the audio world. I think we can support easily both, if there are compatibility wrappers for automatically handling both 32/64 given that you have one implementation. But, it also creates more API surface. With regards to the previous |
This is in some ways a dup of #13423 and #16353. If we're not willing to add a place in the subrepos then we're likely not willing to add a place in the standard library. The right way to start is to create a package somewhere else (github.com/go-audio is great) and get people to use it. Once you have experience with the API being good, then it might make sense to promote to a subrepo or eventually the standard library (the same basic path context followed). More generally, all of these are on hold for #17244, which in turn is essentially blocked on understanding the long-term plan for package management. I intend to focus on that for the next six months. I'll mark this as Proposal-Hold to match the others. |
@egonelbre @taruti don't forget that the std lib is extremely biased towards float64: https://golang.org/pkg/math/ and int in general. While I agree that we should support float32 and int16 I think it does make things way more complicated for internal processing (by that I mean within the Go world). I need to wrap my head around Egon's interface suggestion, for some reasons some aspects make me feel quite uncomfortable which means I need to dig deeper. @rsc that's fair, as a small side note I think #13432 should be revoked (right @rakyll ?). But in either case that shouldn't prevent the people interested in continuing the discussion about a unified Audio interface and the development of core libraries. This discussion is certainly a catalyst and we might be able to put together a special interest group leading the way. |
I agree that most of the Go stdlib is biased towards float64, but I think it would be a mistake to not make float32 support first class in the API. Memory is at a premium in most audio devices and you're halving your available storage for little to no benefit by restricting yourself to float64. |
@mattetti to get a concrete idea: https://github.com/egonelbre/exp/tree/master/audio this is what I'm proposing. Note, I'm unable to compile portaudio on Windows at the moment so I'm not sure whether I've made some terrible mistakes somewhere. Tried to make the code as realistic as possible.
The support for int16 would be more of a convenience implementation. i.e. the Node interface wouldn't contain the API for handling Int16, only float32 and float64. Most audio should work as a With regards to float32/float64 support, yeah, I'm not sure what the correct approach is either. I think it's fine to say that "some particular situation needs some other implementation" and always pick float32/float64. But we should be at least aware who we are/aren't targeting and whether we match their needs.
PS: with my proposed design deals only with processing buffers, not necessarily how you actually load/hold/stream your PCM data. For all intents and purposes you could store it in 4-bit integers, if you wanted to; but when it comes to processing, you handle float32/float64. |
@egonelbre sure, but even in the case of processing buffers this is a concern. Say you are designing a sample-based synthesizer (eg: Akai MPC) and your project has an audio pool it is working with. You'll want to be storing those samples in memory in the native format of your DSP path so you don't have to waste time doing conversions every time you are reading from your audio pool. Also remember that in-memory size also affects caching. |
@egonelbre @kisielk @nigeltao @taruti should we continue this discussion at go-audio/audio#3 We can then circle back and offer an updated proposal once the core team figures out how to deal with #17244 |
The I think for processing, there is arguably smaller units than the proposed Buffer, and that's the discrete signal, represented simply as I see the proposed buffer type as an extension of the discrete signal, and the metadata it can provide will be important in syncing multiple sources, but anything beyond that is probably doing too much (all the Being Go as it is, and much like the math lib, you're simply going to have to commit to float64 with the occasional utility method. The extra complexity otherwise may see no use and may otherwise become an unnecessary burden. I agree a working group might be more fitting to hash out these details. Then, language such as "This might be critical for real time processing" can be written less ambiguously. But this has happened before with various input going into (for example) https://github.com/azul3d/engine/tree/master/audio |
As for a math32 library, I'm not sure if it's necessary. It's slow to call (64-bit) math.Sin inside your inner loop. Instead, I'd expect to pre-compute a global sine table, such as "var sineTable = [4096]float32{ etc }". Compute that table at "go generate" time, and you don't need the math package (or a math32 package) at run time. |
Hi all, Just wanted to mention http://zikichombo.org/blog/launch/ I think we've got a lot of work ahead of us w.r.t. audio and go, but also a lot of opportunities. |
The first linked repo is completely down and the second one hasn't had a commit in 5 years. Something something "The standard library is where packages go to die" @davecheney |
This proposal suggests a new standard audio package. The goal of this package is not to offer a library to consume audio data but instead to define a common interface that could be used by all audio libraries. This common interface would be
audio.Buffer
and concrete implementations would be provided.According to the guidelines, I wrote a design document explaining things in details, it can be found here.
The proposal implementation in code is available here.
Note that this proposal is a follow up on an earlier proposal: #13432
I do realize that adding a new standard package is a big deal and that's why I kept the scope as small as possible and also use the implementation against non-trivial real code. The value added of such a package might not be immediately obvious to programmers not familiar with audio digital signal processing because end users would not see a direct benefit. However, this would be the corner stone of an entire ecosystem. Making audio libraries compatible with each other and offering a happy path means that Go developers would be able to chain a bunch of libraries and create their own audio processing chain without much effort. It's somewhat comparable of the benefits we all see in
io.Reader
.Audio is everywhere and most available audio source code is written in C, but that doesn't have to be the case. With the rise of machine learning and language processing, developers wanting to play with audio data have the choice between hard to use C code and terribly inefficient Python code. Go is a great alternative and we have an opportunity to consolidate and optimize future efforts by defining a common interface.
My goal is to start the conversation now and in the case of acceptance, I would submit the code for review during the 1.9 cycle.
The text was updated successfully, but these errors were encountered: