-
Notifications
You must be signed in to change notification settings - Fork 395
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updates golang/go#13432. Change-Id: I718006e8f039c476d456c1276c54132bd66d9410 Reviewed-on: https://go-review.googlesource.com/17262 Reviewed-by: Burcu Dogan <[email protected]>
- Loading branch information
Showing
1 changed file
with
273 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,273 @@ | ||
# Proposal: Audio for Mobile | ||
|
||
Author: Burcu Dogan | ||
|
||
With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand. | ||
|
||
Last updated: November 30, 2015 | ||
|
||
Discussion at https://golang.org/issue/13432. | ||
|
||
## Abstract | ||
|
||
This proposal suggests core abstractions to support audio decoding | ||
and playback on mobile devices. | ||
|
||
## Background | ||
|
||
In the scope of the Go mobile project, an audio package that supports | ||
decoding and playback is a top priority. The current status of audio | ||
support under x/mobile is limited to OpenAL bindings and an experimental | ||
high-level audio player that is backed by OpenAL. | ||
|
||
The experimental audio package fails to | ||
- provide high level abstractions to represents audio and audio processors, | ||
- implement a memory-efficient playback model, | ||
- implement decoders (e.g. an mp3 decoder), | ||
- support live streaming or other networking audio sources. | ||
|
||
In order to address these concerns, I am proposing core abstractions and | ||
a minimal set of features based on the proposed abstractions to provide | ||
decoding and playback support. | ||
|
||
## Proposal | ||
|
||
I (Burcu Dogan) surveyed the top iOS and Android apps for audio features. | ||
Three major categories with majorly different requirements have revealed | ||
as a result of the survey. A good audio package shouldn't address the | ||
different class of requirements with isolated audio APIs, but must introduce | ||
common concepts and types that could be the backbone of both high- and low- | ||
level audio packages. This is how we will enable users to expand their audio | ||
capabilities by partially delegating their work to lower-level layers of the | ||
audio package without having to rewrite their entire audio stack. | ||
|
||
### Features considered | ||
This section briefly explains the features required in order to support common | ||
audio requirements of the mobile applications. The abstractions we introduce | ||
today should be extendable to meet a majority of the features listed below in | ||
the long run. | ||
|
||
#### Playback | ||
Single or multi-channel playback with player controls such as play, pause, | ||
stop, etc. Games use a looping sample as the background music, so looping | ||
functionality is also essential. Multiple playback instances are needed. Most | ||
games require a background audio track and one-shot audio effects on the | ||
foreground. | ||
|
||
#### Decoding | ||
Codec library and decoding support. Most radio-like apps and music players | ||
need to play a variety of audio sources. Codec support in the parity of | ||
AudioUnit on iOS and OpenMAX on Android is good to have. | ||
|
||
#### Remote streaming | ||
Audio players, radios and tools that streams audio need to be able to work | ||
with remote audio sources. HTTP Live Streaming works on both platforms but | ||
used to be inefficient on Android devices. | ||
|
||
#### Synchronization and composition | ||
- Synchronization between channels/players | ||
- APIs that allow developers to schedule the playback, frame-level timers | ||
- Mixers, multiple channels need to be multiplexed into a single device buffer | ||
- Music software apps that require audio composition and filtering features | ||
|
||
#### Playlist features | ||
Music players and radios require playlisting features, so the users can queue, | ||
unqueue tracks on the player. Player also need shuffling and repeating | ||
features. | ||
|
||
More information on the classification of the audio apps based on the features | ||
listed above is available at Appendix: Audio Apps Classification. | ||
|
||
### Goals | ||
|
||
#### Short-term goals | ||
|
||
- Playback of generated data (such as a PCM sine wave). | ||
- Playback of an audio asset. | ||
- Playback from streaming network sources. | ||
- Core interfaces to represent decoders. | ||
- Initial decoder implementations, ideally delegating the decoding to the | ||
- system codecs (OpenMax for Android and AudioUnit for iOS). | ||
- Basic play functions such as play (looping and one-shot), stop, pause, | ||
gain control. | ||
- Prefetching before user invokes playback. | ||
|
||
#### Longer-term goals | ||
- Multi channel playback (Playing multiple streams at the same time.) | ||
- Multi channel synchronization and an internal clock | ||
- Composition and filtering (mixing of multiple signals, low-pass filter, | ||
reverb, etc) | ||
- Tracklisting features to queue, unqueue multiple sources to a player; | ||
playback features such as prefetching the next song | ||
|
||
### Non-goals | ||
- Audio capture. Recording and encoding audio is not in the roadmap initially. | ||
Both could be added to the package without touching any API surface. | ||
- Dependency on the visual frame rate. This feature requires the audio | ||
scheduler to work in cooperation with the graphics layer and currently not | ||
in our radar. | ||
|
||
### Core abstractions | ||
|
||
The section proposes the core interfaces and abstractions to represent audio, | ||
audio sources and decoding primitives. The goal of introducing and agreeing on | ||
the core abstractions is to be able to extend the audio package features in | ||
the light of the considered features listed above without breaking the APIs. | ||
|
||
#### Clip | ||
The audio package will represent audio data as linear PCM formatted in-memory | ||
audio chuncks. A fundamental interface, Clip, will define how to consume audio | ||
data and how audio attributes (such as bit and sample rate) are reported to | ||
the consumers of an audio media source. | ||
|
||
Clip is an io.ReadSeeker and must be considered as a small window into the | ||
underlying audio data. | ||
|
||
``` | ||
// FrameInfo represents the frame-level information. | ||
type FrameInfo struct { | ||
// Channels represent the number of audio channels | ||
// (e.g. 1 for mono, 2 for stereo). | ||
Channels int | ||
// Bit depth is the number of bits used to represent | ||
// a single sample. | ||
BitDepth int | ||
// Sample rate is the number of samples to be played | ||
// at each second. | ||
SampleRate int64 | ||
} | ||
// Clip is an io.ReadSeeker that represents linear PCM formatted audio. | ||
// Clip can seek and read from a section and allow users to | ||
// consume a small section of the underlying audio data. | ||
// | ||
// FrameInfo returns the basic frame-level information about the clip audio. | ||
// | ||
// Size returns the total number of bytes of the underlying audio data. | ||
// TODO(jbd): Support cases where size is unknown? | ||
type Clip interface { | ||
io.ReadSeeker | ||
FrameInfo() FrameInfo | ||
Size() int64 | ||
} | ||
``` | ||
|
||
#### Decoders | ||
Decoders take any arbitrary input and is responsible to output a clip. | ||
|
||
``` | ||
// Decoder that reads from a Reader and converts the input | ||
// to a PCM clip output. | ||
func Decode(r io.ReadSeeker) (Clip, error) { | ||
panic("not implemented") | ||
} | ||
// A decoder that decodes the given data WAV byte slice and decodes it | ||
// into a PCM clip output. An error is returned if any of the decoding | ||
// steps fail. (e.g. ClipInfo cannot be determined from the WAV header.) | ||
func DecodeWAVBytes(data []byte) (Clip, error) { | ||
panic("not implemented") | ||
} | ||
``` | ||
|
||
#### Clip sources | ||
Any arbitrary valid audio data source can be converted into a clip. | ||
|
||
``` | ||
// NewBufferClip converts a buffer to a Clip. | ||
func NewBufferClip(buf []byte, info FrameInfo) Clip { | ||
panic("not implemented") | ||
} | ||
// NewRemoteClip converts the HTTP live streaming media | ||
// source into a Clip. | ||
func NewRemoteClip(url string) (Clip, error) { | ||
panic("not implemented") | ||
} | ||
``` | ||
|
||
#### Players | ||
|
||
A player plays a series of clips back-to-back, provides basic control | ||
functions (play, stop, pause, seek, etc). | ||
|
||
Note: Currently, x/mobile/exp/audio package provides an experimental and | ||
highly immature player. With the introduction of the new core interfaces, we | ||
will break the API surface in order to bless the new abstractions. | ||
|
||
``` | ||
// NewPlayer returns a new Player. It initializes the underlying | ||
// audio devices and the related resources. | ||
// A player can play multiple clips back-to-back. Players will begin | ||
// prefetching the next clip to provide a smooth and uninterrupted | ||
// playback. | ||
func NewPlayer(c ...Clip) (*Player, error) | ||
``` | ||
|
||
## Compatibility | ||
|
||
No compatibility issues. | ||
|
||
## Implementation | ||
|
||
The current scope of the implementation will be restricted to meet the | ||
requirements listed in the "Short-term goals" sections. | ||
|
||
The interfaces will be contributed by Burcu Dogan. The implementation of the | ||
decoders and playback is a team effort and requires additional planning. | ||
|
||
The audio package has no dependencies to the next Go releases and therefore | ||
doesn't have to fit in the Go release cycle. | ||
|
||
## Open issues | ||
|
||
- WAV and AIFF both support float PCM values even though the use of float | ||
values is unpopular. Should we consider supporting float values? | ||
- Decoding on desktop. The package will use the system codec libraries | ||
provided by Android and iOS on mobile devices. It is not possible to provide | ||
feature parity for desktop envs in the scope of decoding. | ||
- Playback on desktop. The playback may directly use AudioUnit on iOS, and | ||
libmedia (or stagefright) on Android. The media libraries on the desktop are | ||
highly fragmented and cross-platform libraries are third-party dependencies. | ||
It is unlikely that we can provide an audio package that works out of the box | ||
on desktop if we don't write an audio backend for each platform. | ||
- Hardware acceleration. Should we allow users to bypass the decoders and | ||
stream to the device buffer in the longer term? The scope of the audio package | ||
is primarily mobile devices (which case-by-case supports hardware | ||
acceleration). But if the package will cover beyond the mobile, we should | ||
consider this case. | ||
|
||
## Appendix: Audio Apps Classification | ||
|
||
Classification of the audio apps are based on thet survey results mentioned | ||
above. This section summarizes which features are highly related to each other. | ||
|
||
### Class A | ||
Class A mostly represents games that require to play a background sound (in | ||
looping mode or not) and occasionally need to play one-shot audio effects fit | ||
in this category. | ||
- Single channel player with looping audio | ||
- Buffering audio files entirely in memory is efficient enough, audio files | ||
are small | ||
- Timing of the playback doesn’t have to be precise, latency is neglectable | ||
|
||
### Class B | ||
Class B represents games with advanced audio. Most apps that fit in this | ||
category are using advanced audio engines as their audio backend. | ||
- Multi channel player | ||
- Synchronization between channels/players | ||
- APIs that allow developers to schedule the playback, such as frame-level | ||
timers | ||
- Low latency, timing of the playback needs to be precise | ||
- Mixers, multiple channels need to be multiplexed into a single device buffer | ||
- Music software apps require audio composition, filtering, etc | ||
|
||
### Class C | ||
Class C represents the media players. | ||
- Remote streaming | ||
- Playlisting features, multitrack playback features such as prefetching and cross fading | ||
- High-level player controls such as looping and shuffling | ||
- Good decoder support |