Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose MediaSession.activate() and define any implicit activation in terms of it #50

Closed
foolip opened this issue Jun 11, 2015 · 28 comments

Comments

@foolip
Copy link
Member

foolip commented Jun 11, 2015

It has become apparent that our current model, which requires that one of the media session's participants start using the audio output device, is actually slightly more restrictive than iOS, and of course also Android. Both platforms make it possible to request audio focus, thus interrupting other apps, before beginning to produce any audio output. The iOS restriction is that you will not become the "now playing" app until you start producing audio, which is what gives you control over the playback controls in the drawer.

I propose something like this:

partial interface MediaSession {
  Promise<void> activate();
  Promise<void> deactivate();
}

Using with media elements:

var session = new MediaSession();
session.setMetadata({title: 'Punk Rock'});
var audio = new Audio('music');
audio.session = session;
audio.oncanplay = function() {
  session.activate().then(function() {
    audio.play();
  });
};

Using with Web Audio:

var session = new MediaSession();
session.setMetadata({title: 'Synth Pop'});
var context = new AudioContext(session);
// context is started inactive because session was
// prepare the context for playback
session.activate().then(function() {
  context.resume();
});

It's clear that you activate and then play. This is the actual order in the current model as well, but the activation is implicit by attempting to play.

This is not to the exclusion of implicit activation by media elements, but that could be defined in terms of the web-facing API. For Web Audio I can't see a reason to do implicit activation at all.

@doomdavve
Copy link
Contributor

I like how this provides the ability to activate your session for receiving remote control events before starting playback. If we extend the MediaSession with the ability to probe the system for other playing audio, we can properly support the use case where you can set up your app with lock screen UI, meta-data and all and then wait for the play command, provided no other audio is playing on your system. On iOS the meta-data wouldn't show until you start playing, but otherwise it should work fine.

@richtr
Copy link
Member

richtr commented Jun 11, 2015

My issue with an 'activate-then-play' model vs. our current model of 'activate-on-play' (+ the proposed AudioContext activate-on-play behavior proposed in #48 (comment)) can be summarized with the following diagram:

activate_and_play_vs_activate_on_play

With the activate-then-play approach (top diagram above) a web developer could .activate() a session on page load or way before they actually intend to play media on the page. Then they could wait, perhaps indefinitely, for the user to actually start playing the media content in the page. As soon as .activate() is called it will interrupt any other media content playing in any other tab but it does not replace that with other audio. Why did my music player pause() when it has not been replaced by anything else?

The worst part of the proposal is that an "activate-then-play" approach actually encourages web developers to activate() their media session too early: If you play a media element via its controls interface you don't have time to activate before the media starts playing (there is no suitable hook or event before a media element starts playing via its controls to call activate() so you are forced to activate() your media session on e.g. page load and before the user starts playing any media).

In the "activate-on-play" approach (bottom diagram above) it is not possible to interrupt another tab or application's media from playing out until media actually begins playing out in the current tab. That enables a smooth handover to happen between media content and means other web apps do not interrupt, e.g. my currently playing music, until it is unavoidable (because we now need the audio channel and any platform media key bindings available for the new content).

Unless there is an actual use case for allowing what could be an indeterminate amount of time to pass between session.activate() and a call to either new AudioContext(session) or audio.play() from happening, I don't get why this is being considered. The differences between native & web is different enough for this to be considered an anti-pattern for the web (i.e. web developers will struggle to bind with activate() just before media actually starts playing out because the web platform lacks any appropriate hooks/events to do that in certain cases).

What is the use case for that potentially indeterminate gap between .activate() and actual audio output being needed?

@doomdavve
Copy link
Contributor

Rich, did you see the use-case described in the comment one above yours or did you disagree with it in any way?

I see no reason for the potential gap actually being a problem. If so, we would see them in native apps as well.

@foolip
Copy link
Member Author

foolip commented Jun 11, 2015

Lest anyone get the wrong impression from the (very nice) illustration, the activation trigger isn't "non-zero samples are being sent to the audio output" but rather "the audio output is in use, potentially playing silence". In any model, it will thus be possible to simply play silence with Web Audio to get all the power.

This issue isn't motivated by use cases, but by matching the power of native APIs as far as possible. Even having worked in this area for months, I'm not comfortable betting that I know better than the author's of the native APIs and that there will in fact not be any important use cases.

@foolip
Copy link
Member Author

foolip commented Jun 11, 2015

Here's a use case that isn't that far-fetched: you have an immersive experience like the opening sequence of a game, but it starts in silence with only visuals. After some time, a media element is used to ramp up some music. It would be silly if one has to use Web Audio in silence or pad the media element with silence instead of simply activating the session early and playing the music when the time comes.

@richtr
Copy link
Member

richtr commented Jun 11, 2015

Rich, did you see the use-case described in the comment one above yours or did you disagree with it in any way?

I don't disagree with the use case, per se. If you want to grab media key controls at the expense of anything currently playing elsewhere without replacing it with any media of your own immediately then you could still do that. You would confuse a lot of users as to why their music has stopped and their headphone buttons no longer resume that music in another tab or app but...you could still do it. See below...

It would be silly if one has to use Web Audio in silence or pad the media element with silence instead of simply activating the session early and playing the music when the time comes.

If you really wanted to you could achieve the same thing by running the first code snippet in #48 (comment). That gives you exactly the same thing, admittedly with silence still being output to the speakers. But...then you could just .suspend() the AudioContext object again and you get to keep the media focus without any silent audio overhead.

That's not the main use case though in my opinion. The main use case is late binding of the media focus activation (and, thus, delaying the interruption of any other currently playing media to the last possible moment) to enable smooth transitions between web apps and native apps without a web page idly sitting on media focus, perhaps indefinitely. Activate-on-play optimizes for the late binding use case. Activate-then-play optimizes for the early binding use case.

@doomdavve
Copy link
Contributor

I don't disagree with the use case, per se. If you want to grab media key controls at the expense of anything currently playing elsewhere without replacing it with any media of your own immediately then you could still do that. See below...

My use-case depended on MediaSession growing the ability to tell if there is another active MediaSession or other music playing. There are APIs for this on both iOS and Android. So no at the expense of anything currently playing involved.

Given this, the (web)app may want to setup a state where it's ready to play, show lock-screen controls for that and just await a remote control event or a play from elsewhere to start the playlist. It would do so by first checking for other currently playing audio and if there is no currently playing audio call activate() to get the ability to get events and set meta-data.

Granted, the app could do this with a silent media element or WebAudio as well in the implicit activation scenario, but it's far from elegant and the playback state in the remote controls would be wrong. It would show a playing instead of a paused state.

@foolip
Copy link
Member Author

foolip commented Jun 11, 2015

It would be silly if one has to use Web Audio in silence or pad the media element with silence instead of simply activating the session early and playing the music when the time comes.

If you really wanted to you could achieve the same thing by running the first code snippet in #48 (comment).

Requiring web developers to poke at the audio output (in silence) before they actually want to play works, but I too would characterize it as "far from elegant" or indeed "silly."

@richtr
Copy link
Member

richtr commented Jun 11, 2015

@doomdavve wrote:

Granted, the app could do this with a silent media element or WebAudio as well in the implicit activation scenario, but it's far from elegant

@foolip wrote:

Requiring web developers to poke at the audio output (in silence) before they actually want to play works, but I too would characterize it as "far from elegant" or indeed "silly."

Activate-then-play is really the exact same thing, just the other way around. But, crucially, what the current model, activate-on-play, does is to optimize for the 80% use case: interrupt any other currently playing media at the last possible moment when it is immediately replaced by new playing media content.

If any web page activates a new MediaSession when its loads and thus interrupts my music from playing and then never plays any media of its own, as a user I'm not going to be happy (see a selection of real user complaints about this listed below @ #50 (comment)). If it means I then press my headphone button and now, seemingly randomly, some new media starts playing from this new web page instead of the music I was just listening to, I'm not going to be happy for the same reasons.

@foolip
Copy link
Member Author

foolip commented Jun 11, 2015

Nothing stands in the way of catering to the 80% use case and making that API harder to make mistakes with. I mean "define any implicit activation in terms of it" and not "never do implicit activation." I don't think implicit activation for Web Audio looks helpful, but for media elements it does, at the very least as part of a declarative syntax like kind=transient.

@richtr
Copy link
Member

richtr commented Jun 12, 2015

I see no reason for the potential gap actually being a problem. If so, we would see them in native apps as well.

I did some searching and the activate-then-play approach used in native platforms is causing UX problems between native apps:

Related Google search: https://www.google.com/search?q=%22stops+when+i+open%22

https://community.spotify.com/t5/Help-iOS-iPhone-iPad/Spotify-stops-playing-when-switching-to-other-apps-on-iPad/td-p/500764
https://ecommerce.shopify.com/c/shopify-point-of-sale/t/music-interruption-155477
http://forums.overclockers.co.uk/showthread.php?p=22274524
https://twitter.com/Supahboih/status/174768579936387072
http://www.economist.com/comment/755759#comment-755759
http://forums.macrumors.com/threads/app-switching-from-another-app-to-safari-music-stops-in-ios-7.1647070/
https://discussions.apple.com/thread/4379571
https://discussions.apple.com/thread/3797989
https://discussions.apple.com/thread/5450076

What some applications are doing is requesting audio focus early and not before any media actually starts playing and not, as users are expecting, immediately after the user has signalled an intent to actually play media in that app. Users are posting to forums that their music or podcasts are pausing 'seemingly randomly' when opening another app.

For a full explanation of what these applications are doing, see my previous comment @ #50 (comment).

@richtr
Copy link
Member

richtr commented Jun 12, 2015

A couple of other comments:

I like how this provides the ability to activate your session for receiving remote control events before starting playback.

IIUC, calling .activate() will not give you access to remote control events. On iOS you still need actively playing audio to receive these events.

My use-case depended on MediaSession growing the ability to tell if there is another active MediaSession or other music playing. There are APIs for this on both iOS and Android.

I think we should consider an API that lets a web developer detect if anything else is playing on the system. The particular use case I have in mind is: if I'm playing music in a backgrounded app or tab then don't play my web game's background music - only play e.g. its sound effects (just keep playing the background music that was already running when the web page was loaded).

@doomdavve
Copy link
Contributor

IIUC, calling .activate() will not give you access to remote control events. On iOS you need actively playing audio to receive these events.

No, there is no such connection. The only connection I've been able to find is that meta data isn't updated/fetched from app until you start playing.

@richtr
Copy link
Member

richtr commented Jun 12, 2015

IIUC, calling .activate() will not give you access to remote control events. On iOS you need actively playing audio to receive these events.

No, there is no such connection. The only connection I've been able to find is that meta data isn't updated/fetched from app until you start playing.

From the iOS Remote Control Events documentation:

To receive remote control events, do the following:
  * Register handlers for each action you support. [...] 
  * Begin playing audio. Your app must be the “Now Playing” app. An app does not receive remote control events until it begins playing audio.

Is the iOS Remote Control Events documentation wrong?

@doomdavve
Copy link
Contributor

On iOS you need actively playing audio to receive these events.

...

Is the iOS Remote Control Events documentation wrong?

I re-did my experiments and you do need to be the "Now Playing" app to receive the events. It's not clear to me exactly how the "Now Playing" app is defined but I can tell that you don't have to be actively playing audio. It's does seem that you must have been playing audio sometime in the past though, even though your app may have been restarted in the mean time. (The latter part confused me for some time.)

Side note: One other particular issue that confused me for a long time was that prepareToPlay() on the AVAudioPlayer grabbed audio focus behind my back. (This might be an indication we should be careful where we do implicit invocation in general.) I think this issue was part of the problems described in the URLs you posted.

@foolip
Copy link
Member Author

foolip commented Jun 12, 2015

I think we are getting a bit off track in this issue. This is about how to activate a session, and neither iOS nor Android require playback for activation. Making activation explicit matches the actual order of events and gives web developers more power that would otherwise be hidden inside of media elements and Web Audio. More power leaves more room for mistakes, and it's a price worth paying.

In #50 (comment) we don't know if the cause is sloppy use of explicit activation or accidental implicit activation, which is what happened in http://stackoverflow.com/questions/24153917/how-can-i-avoid-interrupting-audio-that-is-already-playing-when-my-app-launches and for @doomdavve too.

@foolip
Copy link
Member Author

foolip commented Jun 15, 2015

@richtr
Copy link
Member

richtr commented Jun 15, 2015

We (@foolip, @doomdavve and I) discussed the following use case today:

var el = document.createElement("audio");
var session = new MediaSession("content");

el.src = "music.mp3";
el.controls = true;
el.session = session;

document.body.appendChild(el);

If a media element is played via its el.controls play button then it is very difficult to activate() the session object immediately before playback starts (if that is what a web developers intends to do). We don't have a good web-exposed hook/event/state at which to activate session immediately prior to the media element starting to load and play via the el.controls play button. This lack of appropriate hooks encourages web developers to request audio focus via session.activate() earlier than they may want or need to (you are almost forced to request audio focus, and therefore interrupt any currently playing audio elsewhere in the system, too early here).

We could aim to fix this by adding a beforePlay-like event to HTMLMediaElement or give developers the ability to listen for el.controls 'click' events at which point they would then be able to call session.activate() at the time they really intended.

Activate-on-play (implicit media session activation) solves this issue without needing to expose any additional hooks or events to web developers.


tl;dr it's very difficult to activate() a media session at the last possible moment before the user starts playing a media element via it controls interface without using an implicit media session activation model. This encourages web developers to activate() a media session before the user has shown any intent to play a media resource which may not be the web developers actual intention. It's very difficult to do the right thing without implicit activation-on-play in this example.

@annevk
Copy link
Member

annevk commented Jun 15, 2015

Couldn't you just define that as doing the activation and then playing?

@annevk
Copy link
Member

annevk commented Jun 15, 2015

To be clear, it seems fine to me to have both explicit and implicit activation. And I don't really see a conflict.

@foolip
Copy link
Member Author

foolip commented Jun 15, 2015

Right, while it's in principle possible to have only explicit activation, I don't propose that. If we have a explicit activation, then implicit activation should happen as you're ready to play (buffered enough, canplay fired) but before you actually play (currentTime increasing). An "important" detail here is the state of HTMLMediaElement.paused in the interim and what happens if activation fails, but sorting that out is very doable, the simplest thing being simply pausing again.

@mounirlamouri
Copy link
Member

Right, while it's in principle possible to have only explicit activation, I don't propose that. If we have a explicit activation, then implicit activation should happen as you're ready to play (buffered enough, canplay fired) but before you actually play (currentTime increasing). An "important" detail here is the state of HTMLMediaElement.paused in the interim and what happens if activation fails, but sorting that out is very doable, the simplest thing being simply pausing again.

While reading this thread, that's exactly what I wanted to propose: let people use .activate() if they want to and have us automatically activate if the media element or the audio context have an unactivated associated media session. FWIW, I think the current Chrome default behaviour is to activate when the media is ready to be played so it should follow the rules that you have listed here @foolip.

richtr added a commit that referenced this issue Jun 25, 2015
…ants'

This commit removes the implicit reliance media element-only objects from the generic media session algorithms in the spec. The specification now defines 'audio-producing participants' that provides a better abstraction point on which we can add additional audio-producing object types (such as e.g. Web Audio). This commit does not enable e.g. Web Audio to obtain media focus yet pending the outcome of #48 and #50.

This commit fixes #49 pending review.
@foolip
Copy link
Member Author

foolip commented Jun 29, 2015

An open question is if there should be any restrictions on when activate() may be called. Three possible answers:

  1. None
  2. Require page visibility: http://w3c.github.io/page-visibility/#sec-document-interface
  3. Require a user gesture: https://html.spec.whatwg.org/#allowed-to-show-a-popup

I'm inclined to start with no restrictions and add them only if we see any abuse that can't be dealt with by other means, but don't feel super strongly either way.

@mounirlamouri
Copy link
Member

CC/ @avayvod @jakearchibald

We think that we should have the same restrictions as .play()/autoplay. If you can activate with these, there is no reasons why the methods couldn't behave the same way. How to translate that in spec words might be hard though ;)

@richtr
Copy link
Member

richtr commented Jul 7, 2015

In #50 (comment) we showed that, in certain situations, the web platform lacks any appropriate hooks for invoking activate() at the right time (i.e. just before media begins playing). Implicit activation takes care of that for us in such scenarios and there is no need for web developers to activate a media session before something then begins playing. Indeed, if activate() is not called then implicit activation is used anyway to obtain the platform media focus and media key access.

With that in mind, explicit activation via activate() then seems to be a convenience function on top of this enforced implicit activation. It simply makes interruption of currently playing media content easier without needing to promise to replace it with anything else.

When nuisance.com interrupts my currently playing music by invoking session.activate() then my music will then be paused indefinitely. To start listening to it again I must navigate back to its associated app (or tab) and begin playing it again. I now have to be careful when I go back to my web browser as I have just learnt that I need to avoid the web page that just interrupted my music without my consent (I didn't explicitly play any media content on that web page), lest it happen again.

Let's not make it a feature of this API to disrupt a user's ongoing music arbitrarily at any time unless you're committing to actually replacing it with something else immediately. Also, let's not make users choose between playing music on their phones and browsing the web at the same time (at the risk of constant unsolicited interruptions); or only being able to do one of these activities at a time (to avoid such potentially unsolicited interruptions from pausing my music).

@annevk
Copy link
Member

annevk commented Jul 7, 2015

That only makes sense if you restrict playing things by default, which I don't think we were planning on doing?

@annevk
Copy link
Member

annevk commented Jul 7, 2015

Also, please consider telephony and Web Audio here. They need to work well.

@foolip
Copy link
Member Author

foolip commented Jul 7, 2015

I have it on my TODO to prepare a PR for activate(), perhaps discussing an actual proposal will help move this along.

foolip added a commit to foolip/mediasession that referenced this issue Sep 3, 2015
This is mostly to get rid of the reference to the top-level browsing
context, and to say that a media session can have zero members.

Related to w3c#100 and
w3c#50
foolip added a commit to foolip/mediasession that referenced this issue Sep 18, 2015
 * Let MediaSession.deactivate() return a promise.
 * Drop implicit activation and decativation for AudioContext.

Fixes w3c#50
foolip added a commit to foolip/mediasession that referenced this issue Sep 22, 2015
Also:
 * Let MediaSession.deactivate() return a promise.
 * Drop implicit activation and decativation for AudioContext.

Fixes w3c#50
foolip added a commit to foolip/mediasession that referenced this issue Sep 25, 2015
Also:
 * Let MediaSession.deactivate() return a promise.
 * Drop implicit activation and decativation for AudioContext.

Fixes w3c#50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants