2022-10-10
There currently exists a Web API for putting an HTMLVideoElement
into a
Picture-in-Picture window (HTMLVideoElement.requestPictureInPicture()
). This
limits a website's ability to provide a custom picture-in-picture experience
(PiP). We want to expand upon that functionality by giving websites the ability
to open a picture-in-picture (i.e., always-on-top) window with a blank document
that can be populated with arbitrary HTMLElement
s instead of only a single
HTMLVideoElement
.
This new window will be much like a blank same-origin window opened via the
existing window.open()
API, with some differences:
- The PiP window will float on top of other windows.
- The PiP window will never outlive the opening window.
- The website cannot set the position of the PiP window.
- The PiP window cannot be navigated (any
window.history
orwindow.location
calls that change to a new document will close the PiP window). - The PiP window cannot open more windows.
[Exposed=Window]
partial interface Navigator {
[SameObject, SecureContext] readonly attribute DocumentPictureInPicture
documentPictureInPicture;
};
[Exposed=Window]
interface DocumentPictureInPicture : EventTarget {
// Opens a new picture-in-picture window.
Promise<DocumentPictureInPictureSession> requestWindow(
optional DocumentPictureInPictureOptions options = {});
// Accessor for the currently open picture-in-picture window, if one exists.
readonly attribute DocumentPictureInPictureSession session;
// Event fired when a picture-in-picture window is opened.
attribute EventHandler onenterpictureinpicture;
};
dictionary DocumentPictureInPictureOptions {
// The desired aspect ratio of the picture-in-picture window when it opens.
// An initial aspect ratio of 0.0 implies that the website does not care to
// set an initial aspect ratio and the UA can determine a size.
float initialAspectRatio = 0.0;
// True if the UA should force the aspect ratio of the picture-in-picture
// window to remain constant when the user resizes the window.
boolean lockAspectRatio = false;
// True if the UA should copy the stylesheets on the current document into the
// picture-in-picture document.
boolean copyStyleSheets = false;
};
[Exposed=Window]
interface DocumentPictureInPictureSession {
[SameObject] readonly attribute Window? window;
Promise<undefined> setAspectRatio(float aspectRatio);
Promise<undefined> setLockAspectRatio(boolean lockAspectRatio);
};
[Exposed=Window]
interface DocumentPictureInPictureEvent : Event {
constructor(DOMString type, DocumentPictureInPictureEventInit eventInitDict);
[SameObject] readonly attribute DocumentPictureInPictureSession
documentPictureInPictureSession;
};
dictionary DocumentPictureInPictureEventInit : EventInit {
required DocumentPictureInPictureSession documentPictureInPictureSession;
};
- Allow a website to display arbitrary
HTMLElements
in an always-on-top window. - To be simple for web developers to use and understand. Note that while
allowing websites to call
requestPictureInPicture()
on any element would be the simplest way, for reasons described below, this isn't feasible.
- This API is not attempting to handle placeholder content for elements that are moved out of the page (that is the responsibility of the website to handle).
- Allowing websites to open always-on-top widgets that outlive the webpage (the PiP window will close when the webpage is closed).
While the existing Picture-in-Picture API for HTMLVideoElement allows a website to provide a Picture-in-Picture video experience, it is very limited in what inputs the window can take and the look-and-feel of those inputs. With a full Document in Picture-in-Picture, the website can provide custom controls and inputs (e.g. captions, playlists, time scrubber, liking/disliking videos, etc) to improve the user's PiP video experience.
It is common for users to leave the tab during a video conferencing session for various reasons (e.g. presenting another tab to the call or multitasking) while still wishing to see the call, so it's a prime use case for Picture-in-Picture. As above, the current experience a video conferencing website can provide via the HTMLVideoElement PiP API is limited in style and input. With a full Document in Picture-in-Picture, the website can easily combine multiple video streams into a single PiP window and provide custom controls like sending a message, muting another user, raising a hand, etc.
<body>
<div id="player-container">
<div id="player">
<video id="video" src="foo.webm"></video>
<!-- More player elements here. -->
</div>
</div>
<input type="button" onclick="enterPiP();" value="Enter PiP" />
</body>
// Handle to the picture-in-picture window.
let pipSession = null;
function enterPiP() {
const player = document.querySelector('#player');
const pipOptions = {
initialAspectRatio: player.clientWidth / player.clientHeight,
lockAspectRatio: true,
copyStyleSheets: true
};
navigator.documentPictureInPicture.requestWindow(pipOptions).then((_pipSession) => {
pipSession = _pipSession;
// Style remaining container to imply the player is in PiP.
playerContainer.classList.add('pip-mode');
// Add player to the PiP window.
pipBody.append(player);
// Listen for the PiP closing event to put the video back.
pipSession.window.addEventListener('unload', onLeavePiP.bind(pipSession), { once: true });
});
}
// Called when the PiP window has closed.
function onLeavePiP() {
if (this !== pipSession) {
return;
}
// Remove PiP styling from the container.
const playerContainer = document.querySelector('#player-container');
playerContainer.classList.remove('pip-mode');
// Add the player back to the main window.
const player = pipSession.document.querySelector('#player');
playerContainer.append(player);
pipSession = null;
}
The window
attribute provides access to the DOM of the
DocumentPictureInPictureSession
object:
const video = pipSession.window.document.querySelector('#video');
video.loop = true;
As part of creating an improved picture-in-picture experience, websites will often want customize buttons and controls that need to respond to user input events such as clicks.
const pipDocument = pipSession.window.document;
const video = pipDocument.querySelector('#video');
const muteButton = pipDocument.createElement('button');
muteButton.textContent = 'Toggle mute';
muteButton.addEventListener('click', () => {
video.muted = !video.muted;
});
pipDocument.body.append(muteButton);
The website may decide to close the DocumentPictureInPictureSession
without
the user explicitly clicking on the window's close button. They can do this by
using the close()
method on the Window
object:
// This will close the PiP window and trigger our existing onLeavePiP()
// listener.
pipSession.window.close();
Sometimes the website will want to change the aspect ratio after the PiP window
is open (e.g., because a new video is playing with a different aspect ratio).
The website can change it via the setAspectRatio()
method on the
DocumentPictureInPictureSession
:
const newVideo = document.createElement('video');
newVideo.id = 'video';
newVideo.src = 'newvideo.webm';
newVideo.addEventListener('loadedmetadata', async (_) => {
const aspectRatio = newVideo.videoWidth / newVideo.videoHeight;
const player = pipSession.window.document.querySelector('#player');
const oldVideo = pipSession.window.document.querySelector('#video');
player.remove(oldVideo);
player.append(newVideo);
await pipSession.setAspectRatio(aspectRatio);
});
newVideo.load();
When the PiP window is closed for any reason (either because the website
initiated it or the user closed it), the website will often want to get the
elements back out of the PiP window. The website can perform this in an event
handler for the unload
event on the window
associated with the
DocumentPictureInPictureSession
object. This is shown in the
onLeavePiP()
handler in Example code section above and is
copied below:
// Called when the PiP window has closed.
function onLeavePiP() {
if (this !== pipSession) {
return;
}
// Remove PiP styling from the container.
const playerContainer = document.querySelector('#player-container');
playerContainer.classList.remove('pip-mode');
// Add the player back to the main window.
const player = pipSession.document.querySelector('#player');
playerContainer.append(player);
pipSession = null;
}
Why not extend the HTMLVideoElement.requestPictureInPicture()
idea to allow it to be called on any HTMLElement
?
Any API where the UA is taking elements out of the page and then reinserting them ends up with tricky questions on what to show in the current document when those elements are gone (do elements shift around? Is there a placeholder? What magic needs to happen when things resize? etc). By leaving it up to websites to move their own elements, the API contract between the UA and website is much clearer and simpler to understand.
The main reason we decided to have a completely separate API is to make it
easier for websites to detect it (since in most cases, falling back to a
standard window would be undesirable and websites would rather use
HTMLVideoElement
PiP instead). Additionally, it also works differently enough
from window.open()
(e.g., never outliving the opener) that having it separate
makes sense.
Giving websites less control over the size/position of the window will help
prevent, e.g., phishing attacks where a website pops a small always-on-top
window over an input
element to steal your password.
Surface Element was a proposal where the website would wrap PiP-able content in advance with a new type of iframe-like element that could be pulled out into a separate window when requested. This had some downsides including always requiring the overhead of a separate document (even in the most common case of never entering picture-in-picture).
We also considered a similar approach to the one in this document, but with no
input allowed in the DOM (only allowlisted controls from a predetermined list in
a similar fashion to the existing HTMLVideoElement
PiP). One issue with this
approach is that it really didn't help websites do much more than they already
can today, since a website can draw anything in a canvas element and PiP a video
with the canvas as a source. Having HTMLElements
that can actually be
interacted with is what makes the Document Picture-in-Picture feature worth
implementing.
Many thanks to Frank Liberato, Mark Foltz, Klaus Weidner, François Beaufort, Charlie Reis, Joe DeBlasio, Domenic Denicola, and Yiren Wang for their comments and contributions to this document and to the discussions that have informed it.