-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect media keys use cases #1
Comments
That would be great @foolip! |
So, writing the readme I tried to consolidate the use cases from:
If I missed anything, let me know. Let's move to doing PRs from this moment forward. |
I've created a new team for this repo and added @richtr. Rich, do you want to take a stab at integrating our use cases? I think that having detailed descriptions of the desired end-user experience in some order of priority helps. |
@marcoscaceres looks good regarding the extensibility section---I like the presenter use case especially. Thanks. |
Unless it's possible to obtain remote control events when no audio is playing out then I suspect the presenter use case may become out of scope. On iOS you must have audio playing out to receive remote control events. There are of course clever workarounds (e.g. https://stackoverflow.com/questions/10885047/receive-remote-control-events-without-audio) which could work equally well on the web platform for the presenter use case. |
On Thursday, February 5, 2015, Rich Tibbett [email protected]
It's why I said, "when possible". It wouldn't probably make sense in the
|
Ok, so another reason this should not be exclusively hard bound to
Having the ability to route the events to an independent object would be a start. Also, it seems jPlayer, and the sites that rely on it, would benefit tremendously from this functionality. It is used on:
JPlayer can compose playlists of both audio and video. I've ping @maboa to comment. |
With @richtr I've been arguing for this position, that if the "focus-holder" is a standalone object then it will be easier to implement and possible to make work with Web Audio API and even Flash. However, there are two hurdles:
|
It would be fantastic if you could reach out to some Apple folks. To be clear, what @domenic and @tabatkins suggested with regards to extensibility is that the API be able to work with both imperative and declarative models. This could mean that, for instance, I'm totally speculating here, but it might be that if you are building a Audio/Video player, you should be expected to hand over a
No, I don't think we should give that up (as the above) - and I have no strong opinions right now, just want to make sure we explore many options (fail quickly, fast, land at something awesome). I don't yet have a coherent model in my head about how this is going to work - but I think it's good to see if we can sensibly generalize this... if OSs won't let us, then we can live with that. But I think there is enough evidence and demand for general remote control events that we should consider it. Having said that, we may again throw the problem at DOM3 (as they really are key events), but that's not yielded much happiness so far. |
I've sent email to @eric-carlson and @jernoble directing them to this issue. |
I don't think these are really key events in every aspect. In particular, when one media player starts playing and causes another to pause, some event is delivered to that other player which isn't caused by input on that page. |
Note that this isn't the case if I'm running both Spotify and iTunes on MacOS. We probably wouldn't expect the browser to pause Spotify or iTunes. Note I'm talking about desktop browsers here. However, it makes total sense on an OS that has a kind of centralized media player - such as those on Android and iOS.
Sorry, I'm not following this bit; can you please clarify what you mean? |
OK, different platforms have different conventions. It is possible to play audio in multiple apps at the same time in Android, so it isn't just a question of a platform limitation, though. The example, on any platform you like: I'm listening to music in a browser tab. Then I start to watch a movie or join a video conference in another tab. When that starts, which may not involve me touching any input device, the music should pause. In order to pause it must either be forcibly paused or some event must be delivered kindly asking it to pause. |
Actually, not should pause, that doesn't always make sense. It should be possible to have it pause when it is no longer the audio focusee. |
I was going to say, that in a browsing environment we have a choice about how this should work (@tobie has some opinions about this too). It might be that if a tab ops-into receiving the media-key events, part of the API contract is that you will get paused if some other tab gains the media focus. |
Excited to see this spec! In general, I'm satisfied with iOS and there's really not much to do here. It might be different if apps are using Advanced Audio, but for normal video and audio tags, the low-hanging fruit is Android and desktop OS (and probably other mobile OSs). You'd have to do a lot of work on those before it's worth even considering iOS imo. I should note that there are problems in general with media keys on standard platforms, control devices, and streaming protocols like Google/Chrome Cast and Airplay. The most glaring omission is the typical lack of a skip feature, ie can switch to next/previous track but not move within the track. The omission probably comes from the fact these standards were born around music, and it becomes a lot more of an issue with videos and podcasts. Other missing controls include subtitle toggle and speed control. Having noted those anomalies for the sake of completeness, my suggestion would be to ignore them, and instead focus on letting web apps do what native apps can do rather than trying to leapfrog native capability. Also, as I mentioned, a lot of this is baked in at hardware level. Headphone controls, keyboards, steering wheels. They simply lack those extra buttons or any ability to customise. It could certainly be supported in some cases, e.g. using Function keys, but again, not the low-hanging fruit for web apps. (Just to clarify about the jPlayer comment above, Player FM no longer uses jPlayer. It did its job well for a long time, but the main value prop for us initially was Flash fallback, no longer a major concern, and opted to DIY the player when it came time for a site redesign. There's an internal player API which detaches view from model and would make it trivial to support Media Keys standard. I hope to do so.) |
@mahemoff, what do you mean about iOS, does it already work such that you can have Web apps playing music respond to the headphone button and pause when something else starts playing? |
@foolip When something else starts playing then it will take over control of the iOS media controls. Unlike desktop OSes you can't play multiple things at once. |
I like the idea of a media focus concept different and independent of the main focus. I'd naively define it as: the focused browsing context unless there's a background browsing context which has captured the media focus (e.g. by playing a media). There, of course, can only be one media focused browsing context at a time. |
Thanks @Jon889. I suppose that enforced pausing must also be incorporated somehow then, a spec cannot say "if this event is not handled playback will continue in parallel with the new audio stream" or similar. |
@foolip, what do you mean by enforced pausing? |
About media focus - it feels like that may be beyond the scope of this spec. It's a complicated area. Apps can declare themselves to play in the background or not. Audio focus can be transient or not. Even within transient audio events, some can indicate "ducking" (playing in parallel) is okay and some can insist on playing exclusively. Is all this in scope for a spec on media controls? |
@Jon889, I mean stopping playback regardless of what the page does with whatever events are are delivered to it. |
@mahemoff, if what iOS does is in fact good and sufficient, then we should stop right now and have this fixed in operating systems and browsers without introducing any new APIs to the Web. For simple playing and pausing in response to button presses I agree that one can do it, and in fact Opera for Android has such a feature. Forward and back buttons can't work without the cooperation of the page, however. I think it's necessary to include some concept of audio focus, because without it where do deliver the events when nothing is playing? The last thing that was playing should continue playing. |
Also, what exactly does "playing" mean in the context of e.g.: a slideshow, a slide deck while presenting, etc. (hence the suggested notion of media focus capture). |
@mahemoff, I agree that what iOS does gets us most of the way there - and works quite nicely. However, as @foolip said, forward/back still needs to work in coordination with the page (they obviously don't work in the example below). Also, currently, it's a bit sad what ends up on the lock screen (i.e., the URL, instead of a title of media, no artwork, etc.): Where the equivalent in the web app is: So while maybe play/stop are transient things, just forward and back. So, I also agree with @mahemoff that just getting parity with native would be great - we might not need to do anything with regards to media focus on mobile if it's mostly just handled at the OS level. |
As @foolip says, we support media notification features in Opera Mobile for Android today. I don't think we've explored this feature as far as Apple have in iOS but there is obviously a lot we can do already without additional hooks or APIs being added to the web platform. We could try to enable that iOS behavior everywhere but that assumes a.) all A/V content is created equally and b.) we will enforce only one media element playing out at any given time across platforms. To balance this we are therefore proposing to let web developers opt-in to that behavior by declaring We could later on have more lower level APIs here but web developers would probably appreciate us exposing consistent simple media key behavior across platforms. That's why we drafted HTML Media Focus. So there's also a question of what the scope of this work should be. Would it be useful if we could initially support something very similar to the iOS behavior across platforms or should we target a lower level integration from the start? |
@marcoscaceres Yes - again, it's a question of scope, if you consider media metadata to be part of this. It's certainly a lot less complex than audio focus and I'd be happy to support it. @richtr This is a very sensible approach, let developers opt into it and support it automagically. Great wins - no implementation effort required and all web apps act consistently. The hard part is already done, standardising the media API, so it should be easy enough to build on that. |
As for metadata, my podcast player (DoggCatcher) has a lock screen UI where back and forward buttons actually skips back or forward 10 seconds, and a notification UI with the same, plus a "mark episode as done and skip to next" button. I'm very skeptical that anything less than full control over the rendering of those UIs and a communication with the original page is going to be good enough. Metadata would amount to a declarative solution and I don't think it can be on parity with native platforms. |
@foolip re use cases: filed richtr/html-media-focus#4 |
@foolip Whatever you see on the Android lock-screen is standard OS UI. There's an artwork image and title associated with the current episode, which is what Android renders. You comment about skipping within and skipping between tracks makes me think you're referring to the app's own full-screen player, as it's not possible to do both on Android. The app can respond to events (so it can mark an episode played for example) and interpret them, but it can't change the UI. In other words, a browser can automatically implement Android screen-lock for any audio/video element as long as:
|
OK, maybe the lock screen layout is standard. It's apparently up to the app how to handle the previous/next events, though. If the iOS lock screen is equally declarative then it sounds like having metadata would be the only way to deal with it. The notification tray UI is not standard. I know this because we had to write our own in Opera for Android. This is also how Doggcatcher can have a special button in this UI that isn't in the lock screen UI. Not sure what the app's full-screen player would be, AFAICT Doggcatcher never enters fullscreen. |
A brief clarification on the iOS model. Generally speaking, each app has a single shared instance of an iOS WebKit uses These AVAudioSessions are kept in a system-wide "most-recently-active" list. There is another per-app object called |
"It's apparently up to the app how to handle the previous/next events, though" "The notification tray UI is not standard." "Not sure what the app's full-screen player would be" |
@jernoble That's really interesting iOS supports parallel audio and video like that. I think in most cases that would not be desirable behaviour; it would be usually better to treat them as competing for the same focus. |
@mahemoff And usually, it does treat them as "competing for focus". Most media playback is going to be simple video or music playback, which would get a |
To continue with my iOS model explanation, the lock screen layout on iOS is not "fixed" per se. If the app with the most-recently-active session signs up to handle "togglePlayPause", "seekForward", and "skipBackwards" commands, the lock screen will display "⟲►↠" (i.e., a podcast-like set of controls). If instead it supports "dislike", "like", togglePlayPause", and "nextTrack", it will display "★►→" (i.e., a Spotify-like set of controls). |
Hey, thanks for launching this initiative and pointing me to it, much appreciated. Before releasing a draft and/or MVP we need to answer these questions imho:
Metadata, UI customisations, and implementation details will be easier to consider once we have consensus on a basic set of user interactions like these. |
Flash and Silverlight already have access to their platform's remote commands by virtue of being plugins. There's nothing stopping Flash from adding support for hardware media controls and providing it as API within their own runtime. |
@jernoble, I was not aware of this. I tried finding information about this API (for flash), but couldn't find anything :( Do you have a link?
This statement holds true. However, given the decline of both technologies, and commitments by both Microsoft and Adobe to the Web (+end of life of Silverlight), I'm betting this would be unlikely. Also, if we standardize this in browsers, it might encourage more media apps that rely on Flash to move to the Web. |
Thanks for that lock screen explanation @jernoble, it sounds like even though it isn't fixed it's still far simpler than doing a custom Web-based UI. BTW, are you happy with this model, or would you want something different if you were to implement a media keys/focus API in WebKit for iOS? If supporting the iOS lock screen model is a goal, then there aren't too many ways of designing an API. We would need (1) a way to sign up to handle a set of lock screen actions (play/pause/skip/like/etc.) (2) some place to deliver the events when those actions are taken and possibly (3) a way to query which actions are supported, for feature detection. |
Yes, my point was to think about providing a mechanism that is independent from playback technology, which acts as a bridge and provides means to application authors to handle playback themselves with the technology they use. Using an existing event based API, e.g. adding new key codes or providing a new API would be sufficient to achieve this. |
Is it important for the UA to be able to enforce media playback, pausing, seeking and resuming or should we expect web developers will always do the right thing and pause, seek and resume their media when they are told to pause, seek and resume via a decoupled API? A strongly-bound API would enable a UA to enforce logical behavior. A loosely-bound API is likely to lead to a poor and inconsistent user experience across different web pages. |
Oh, I don't believe they've added this API yet, but they could. On OS X, at least, Flash & Silverlight run as a native plugin, with all the same access to system APIs as the browser. (Modulo sandboxing, of course.) So it has the same access to system-level remote control commands as browser vendors do. This may not be the case with Chrome and their PPAPI-based plugins, however.
I agree, but that also makes the case against including plugins in the scope of this feature. They're not going to do the work necessary to support this feature anyway. |
Except that authors who use Flash are going to want this support inside their Flash application, not out in the DOM in JavaScript. Flash does not currently support a "play()" or "pause()" API on Flash objects from JavaScript. I'm doubtful they'd add one just for this feature. |
Assuming for the moment that is a goal, wouldn't this be as simple as (1) defining a new message for each remote command we want to support? (2) The event would be fired at the "focused" media element. (3) I'm not sure this is necessary if we're just using event names. You would listen for all the events you would like to handle, and the UA would decide what subset of those to display on the lock screen. Alternatively, clients could do feature detection by checking, e.g., |
As much as I would love this to be true, the reality is that, the biggest content providers still rely on Flash and other technologies for playback, providing an API that doesn't allow authors to trigger playback on such technologies would lead to an unusable implementation, and authors would most certainly ignore it. |
This is to support legacy browsers which do not support native <audio>, <video>, and Web Audio features. And these legacy browsers are never going to add APIs for remote control events.
SoundCloud, Youtube, and Rdio all currently work without Flash installed (In Safari on OS X). Spotify is not a web-app, it's a native app. And Deezer is not available in the US (edit: so I can't check whether it works without Flash installed). |
Not really, this is to support features that browsers aren't providing yet, or with which authors aren't satisfied yet, e.g. adaptive streaming, live streaming, protected content, etc
They do partially, YouTube still serves Flash as a default to Firefox, it also relies on Flash for live streaming, Flash is currently used by SoundCloud to support content over RTMP, etc
Spotify has a webapp: play.spotify.com, and Deezer is available in the US, not to mentioned that this isn't an argument to begin with as the Web crosses borders and a spec we come up with must work independently from your geographical position. |
Those features are all present in the web platform. We shouldn't spec additional features to work around browsers who haven't implemented existing features.
YouTube uses Media Source Extensions to serve live streaming video on platforms which support it. So the solution is for UAs to implement (or improve) those existing features.
Ah and I see play.spotify.com requires Flash. Well, as I said earlier, they can convince Adobe to add remote control support to the Flash runtime. As I explained in my edit, my only point about Deezer was that I couldn't verify whether it worked without Flash installed. |
@jernoble, yes, I think it's about that simple and that sounds fine, constraining the possible design isn't necessarily a bad thing :) We would have to have a look at how lock screen controls on Android work. At the very least it also supports adding album art as the background, but I don't know if it's a declarative thing or if everything is custom UI. |
As for Flash, I previously argued with @richtr that all else equal it would be good with an API that stands on its own and could be used to control media elements, Web Audio or even Flash. However, all else does not seem to be equal, as at least iOS seems to require that playback begins before granting audio focus, if that's the right terminology. I don't know how to blend that with Flash, where all the browser can tell is that audio is being produced. If that were sufficient, then it would be sufficient for non-Flash content as well and we wouldn't need any new API surface. @jernoble, do you think tying into the HTMLMediaElement playing event or similar would be a good match for implementing on top of iOS, or how would you deal with this? |
Closing this issue as use cases discussed here are covered in the README. Please send further use cases in the form of bugs or pull requests. |
https://github.com/richtr/html-media-focus
@richtr, do you want me to invite you to the whatwg org so that you can fiddle with this repo too? If that's OK by you, @marcoscaceres?
The text was updated successfully, but these errors were encountered: