Skip to content

Commit

Permalink
design: audio for mobile
Browse files Browse the repository at this point in the history
Updates golang/go#13432.

Change-Id: I718006e8f039c476d456c1276c54132bd66d9410
Reviewed-on: https://go-review.googlesource.com/17262
Reviewed-by: Burcu Dogan <[email protected]>
  • Loading branch information
rakyll committed Dec 8, 2015
1 parent 54f41a7 commit 3e77f0d
Showing 1 changed file with 273 additions and 0 deletions.
273 changes: 273 additions & 0 deletions design/13432-mobile-audio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
# Proposal: Audio for Mobile

Author: Burcu Dogan

With input from David Crawshaw, Hyang-Ah Kim and Andrew Gerrand.

Last updated: November 30, 2015

Discussion at https://golang.org/issue/13432.

## Abstract

This proposal suggests core abstractions to support audio decoding
and playback on mobile devices.

## Background

In the scope of the Go mobile project, an audio package that supports
decoding and playback is a top priority. The current status of audio
support under x/mobile is limited to OpenAL bindings and an experimental
high-level audio player that is backed by OpenAL.

The experimental audio package fails to
- provide high level abstractions to represents audio and audio processors,
- implement a memory-efficient playback model,
- implement decoders (e.g. an mp3 decoder),
- support live streaming or other networking audio sources.

In order to address these concerns, I am proposing core abstractions and
a minimal set of features based on the proposed abstractions to provide
decoding and playback support.

## Proposal

I (Burcu Dogan) surveyed the top iOS and Android apps for audio features.
Three major categories with majorly different requirements have revealed
as a result of the survey. A good audio package shouldn't address the
different class of requirements with isolated audio APIs, but must introduce
common concepts and types that could be the backbone of both high- and low-
level audio packages. This is how we will enable users to expand their audio
capabilities by partially delegating their work to lower-level layers of the
audio package without having to rewrite their entire audio stack.

### Features considered
This section briefly explains the features required in order to support common
audio requirements of the mobile applications. The abstractions we introduce
today should be extendable to meet a majority of the features listed below in
the long run.

#### Playback
Single or multi-channel playback with player controls such as play, pause,
stop, etc. Games use a looping sample as the background music, so looping
functionality is also essential. Multiple playback instances are needed. Most
games require a background audio track and one-shot audio effects on the
foreground.

#### Decoding
Codec library and decoding support. Most radio-like apps and music players
need to play a variety of audio sources. Codec support in the parity of
AudioUnit on iOS and OpenMAX on Android is good to have.

#### Remote streaming
Audio players, radios and tools that streams audio need to be able to work
with remote audio sources. HTTP Live Streaming works on both platforms but
used to be inefficient on Android devices.

#### Synchronization and composition
- Synchronization between channels/players
- APIs that allow developers to schedule the playback, frame-level timers
- Mixers, multiple channels need to be multiplexed into a single device buffer
- Music software apps that require audio composition and filtering features

#### Playlist features
Music players and radios require playlisting features, so the users can queue,
unqueue tracks on the player. Player also need shuffling and repeating
features.

More information on the classification of the audio apps based on the features
listed above is available at Appendix: Audio Apps Classification.

### Goals

#### Short-term goals

- Playback of generated data (such as a PCM sine wave).
- Playback of an audio asset.
- Playback from streaming network sources.
- Core interfaces to represent decoders.
- Initial decoder implementations, ideally delegating the decoding to the
- system codecs (OpenMax for Android and AudioUnit for iOS).
- Basic play functions such as play (looping and one-shot), stop, pause,
gain control.
- Prefetching before user invokes playback.

#### Longer-term goals
- Multi channel playback (Playing multiple streams at the same time.)
- Multi channel synchronization and an internal clock
- Composition and filtering (mixing of multiple signals, low-pass filter,
reverb, etc)
- Tracklisting features to queue, unqueue multiple sources to a player;
playback features such as prefetching the next song

### Non-goals
- Audio capture. Recording and encoding audio is not in the roadmap initially.
Both could be added to the package without touching any API surface.
- Dependency on the visual frame rate. This feature requires the audio
scheduler to work in cooperation with the graphics layer and currently not
in our radar.

### Core abstractions

The section proposes the core interfaces and abstractions to represent audio,
audio sources and decoding primitives. The goal of introducing and agreeing on
the core abstractions is to be able to extend the audio package features in
the light of the considered features listed above without breaking the APIs.

#### Clip
The audio package will represent audio data as linear PCM formatted in-memory
audio chuncks. A fundamental interface, Clip, will define how to consume audio
data and how audio attributes (such as bit and sample rate) are reported to
the consumers of an audio media source.

Clip is an io.ReadSeeker and must be considered as a small window into the
underlying audio data.

```
// FrameInfo represents the frame-level information.
type FrameInfo struct {
// Channels represent the number of audio channels
// (e.g. 1 for mono, 2 for stereo).
Channels int
// Bit depth is the number of bits used to represent
// a single sample.
BitDepth int
// Sample rate is the number of samples to be played
// at each second.
SampleRate int64
}
// Clip is an io.ReadSeeker that represents linear PCM formatted audio.
// Clip can seek and read from a section and allow users to
// consume a small section of the underlying audio data.
//
// FrameInfo returns the basic frame-level information about the clip audio.
//
// Size returns the total number of bytes of the underlying audio data.
// TODO(jbd): Support cases where size is unknown?
type Clip interface {
io.ReadSeeker
FrameInfo() FrameInfo
Size() int64
}
```

#### Decoders
Decoders take any arbitrary input and is responsible to output a clip.

```
// Decoder that reads from a Reader and converts the input
// to a PCM clip output.
func Decode(r io.ReadSeeker) (Clip, error) {
panic("not implemented")
}
// A decoder that decodes the given data WAV byte slice and decodes it
// into a PCM clip output. An error is returned if any of the decoding
// steps fail. (e.g. ClipInfo cannot be determined from the WAV header.)
func DecodeWAVBytes(data []byte) (Clip, error) {
panic("not implemented")
}
```

#### Clip sources
Any arbitrary valid audio data source can be converted into a clip.

```
// NewBufferClip converts a buffer to a Clip.
func NewBufferClip(buf []byte, info FrameInfo) Clip {
panic("not implemented")
}
// NewRemoteClip converts the HTTP live streaming media
// source into a Clip.
func NewRemoteClip(url string) (Clip, error) {
panic("not implemented")
}
```

#### Players

A player plays a series of clips back-to-back, provides basic control
functions (play, stop, pause, seek, etc).

Note: Currently, x/mobile/exp/audio package provides an experimental and
highly immature player. With the introduction of the new core interfaces, we
will break the API surface in order to bless the new abstractions.

```
// NewPlayer returns a new Player. It initializes the underlying
// audio devices and the related resources.
// A player can play multiple clips back-to-back. Players will begin
// prefetching the next clip to provide a smooth and uninterrupted
// playback.
func NewPlayer(c ...Clip) (*Player, error)
```

## Compatibility

No compatibility issues.

## Implementation

The current scope of the implementation will be restricted to meet the
requirements listed in the "Short-term goals" sections.

The interfaces will be contributed by Burcu Dogan. The implementation of the
decoders and playback is a team effort and requires additional planning.

The audio package has no dependencies to the next Go releases and therefore
doesn't have to fit in the Go release cycle.

## Open issues

- WAV and AIFF both support float PCM values even though the use of float
values is unpopular. Should we consider supporting float values?
- Decoding on desktop. The package will use the system codec libraries
provided by Android and iOS on mobile devices. It is not possible to provide
feature parity for desktop envs in the scope of decoding.
- Playback on desktop. The playback may directly use AudioUnit on iOS, and
libmedia (or stagefright) on Android. The media libraries on the desktop are
highly fragmented and cross-platform libraries are third-party dependencies.
It is unlikely that we can provide an audio package that works out of the box
on desktop if we don't write an audio backend for each platform.
- Hardware acceleration. Should we allow users to bypass the decoders and
stream to the device buffer in the longer term? The scope of the audio package
is primarily mobile devices (which case-by-case supports hardware
acceleration). But if the package will cover beyond the mobile, we should
consider this case.

## Appendix: Audio Apps Classification

Classification of the audio apps are based on thet survey results mentioned
above. This section summarizes which features are highly related to each other.

### Class A
Class A mostly represents games that require to play a background sound (in
looping mode or not) and occasionally need to play one-shot audio effects fit
in this category.
- Single channel player with looping audio
- Buffering audio files entirely in memory is efficient enough, audio files
are small
- Timing of the playback doesn’t have to be precise, latency is neglectable

### Class B
Class B represents games with advanced audio. Most apps that fit in this
category are using advanced audio engines as their audio backend.
- Multi channel player
- Synchronization between channels/players
- APIs that allow developers to schedule the playback, such as frame-level
timers
- Low latency, timing of the playback needs to be precise
- Mixers, multiple channels need to be multiplexed into a single device buffer
- Music software apps require audio composition, filtering, etc

### Class C
Class C represents the media players.
- Remote streaming
- Playlisting features, multitrack playback features such as prefetching and cross fading
- High-level player controls such as looping and shuffling
- Good decoder support

0 comments on commit 3e77f0d

Please sign in to comment.