-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy pathindex.bs
742 lines (622 loc) · 33.6 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
<pre class='metadata'>
Title: MediaStreamTrack Insertable Media Processing using Streams
Shortname: mediacapture-transform
Level: None
Status: ED
Group: webrtc
Repository: w3c/mediacapture-transform
TR: https://www.w3.org/TR/mediacapture-transform/
URL: https://w3c.github.io/mediacapture-transform/
Editor: Harald Alvestrand, w3cid 24610, Google https://google.com, [email protected]
Editor: Guido Urdaneta, w3cid 84810, Google https://google.com, [email protected]
Abstract: This API defines an API surface for manipulating the bits on
Abstract: {{MediaStreamTrack}}s carrying raw data.
Markup Shorthands: css no, markdown yes
</pre>
<pre class=anchors>
url: https://w3c.github.io/webcodecs/#videoframe; text: VideoFrame; type: interface; spec: WEBCODECS
url: https://w3c.github.io/webcodecs/#videoencoder; text: VideoEncoder; type: interface; spec: WEBCODECS
url: https://streams.spec.whatwg.org/#readablestream-controller; text: [[controller]]; for: ReadableStream; type: dfn; spec: STREAMS
</pre>
<pre class=link-defaults>
spec:infra; type:dfn; text:queue
spec:streams; type:interface; text:WritableStream
</pre>
# Introduction # {#introduction}
The [[WEBRTC-NV-USE-CASES]] document describes several functions that
can only be achieved by access to media (requirements N20-N22),
including, but not limited to:
* Funny Hats
* Machine Learning
* Virtual Reality Gaming
These use cases further require that processing can be done in worker
threads (requirement N23-N24).
This specification gives an interface based on [[WEBCODECS]] and [[STREAMS]] to
provide access to such functionality.
This specification provides access to raw media,
which is the output of a media source such as a camera, microphone, screen capture,
or the decoder part of a codec and the input to the
decoder part of a codec. The processed media can be consumed by any destination
that can take a MediaStreamTrack, including HTML <video> tags,
RTCPeerConnection, canvas or MediaRecorder.
This specification explicitly aims to support the following use cases:
- *Video processing*: This is the "Funny Hats" use case, where the input is a single video track and the output is a transformed video track.
- *Custom video sink*: In this use case, the purpose is not producing a processed {{MediaStreamTrack}}, but to consume the media in a different way. For example, an application could use [[WEBCODECS]] and [[WEBTRANSPORT]] to create an {{RTCPeerConnection}}-like sink, but using different codec configuration and networking protocols.
- *Multi-source processing*: In this use case, two or more tracks are combined into one. For example, a presentation containing a live weather map and a camera track with the speaker can be combined to produce a weather report application.
Note: There is no WG consensus on whether or not audio use cases should be supported.
Note: The WG expects that the Streams spec will adopt the solutions outlined in
[the relevant explainer](https://github.com/whatwg/streams/blob/main/streams-for-raw-video-explainer.md), to solve some issues with the current Streams specification.
# Specification # {#specification}
This specification shows the IDL extensions for [[MEDIACAPTURE-STREAMS]].
It defines some new objects that inherit the {{MediaStreamTrack}} interface, and
can be constructed from a {{MediaStreamTrack}}.
The API consists of two elements. One is a track sink that is
capable of exposing the unencoded media frames from the track to a ReadableStream.
The other one is the inverse of that: it provides a track source that takes
media frames as input.
<!-- ## Extension operation ## {#operation} -->
## MediaStreamTrackProcessor ## {#track-processor}
A {{MediaStreamTrackProcessor}} allows the creation of a
{{ReadableStream}} that can expose the media flowing through
a given {{MediaStreamTrack}}. If the {{MediaStreamTrack}} is a video track,
the chunks exposed by the stream will be {{VideoFrame}} objects.
This makes {{MediaStreamTrackProcessor}} effectively a sink in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a>.
A {{MediaStreamTrackProcessor}} internally contains a circular queue
that allows buffering incoming media frames delivered by the track it
is connected to. This buffering allows the {{MediaStreamTrackProcessor}}
to temporarily hold frames waiting to be read from its associated {{ReadableStream}}.
The application can influence the maximum size of the queue via a parameter
provided in the {{MediaStreamTrackProcessor}} constructor. However, the
maximum size of the queue is decided by the UA and can change dynamically,
but it will not exceed the size requested by the application.
If the application does not provide a maximum size parameter, the UA is free
to decide the maximum size of the queue.
When a new frame arrives to the
{{MediaStreamTrackProcessor}}, if the queue has reached its maximum size,
the oldest frame will be removed from the queue, and the new frame will be
added to the queue. This means that for the particular case of a queue
with a maximum size of 1, if there is a queued frame, it will aways be
the most recent one.
The UA is also free to remove any frames from the queue at any time. The UA
may remove frames in order to save resources or to improve performance in
specific situations. In all cases, frames that are not dropped
must be made available to the {{ReadableStream}} in the order in which
they arrive to the {{MediaStreamTrackProcessor}}.
A {{MediaStreamTrackProcessor}} makes frames available to its
associated {{ReadableStream}} only when a read request has been issued on
the stream. The idea is to avoid the stream's internal buffering, which
does not give the UA enough flexibility to choose the buffering policy.
### Interface definition ### {#track-processor-interface}
<pre class="idl">
[Exposed=DedicatedWorker]
interface MediaStreamTrackProcessor {
constructor(MediaStreamTrackProcessorInit init);
readonly attribute ReadableStream readable;
};
dictionary MediaStreamTrackProcessorInit {
required MediaStreamTrack track;
[EnforceRange] unsigned short maxBufferSize;
};
</pre>
Note: There is WG consensus that the interface should be exposed on DedicatedWorker.
There is no WG consensus on whether or not the interface should not be exposed on Window.
Note: There is consensus in the WG that creating a MediaStreamTrackProcessor from a MediaStreamTrack of kind "video" should exist.
There is no WG consensus on whether or not creating a MediaStreamTrackProcessor from a MediaStreamTrack of kind "audio" should be supported.
### Internal slots ### {#internal-slots-processor}
<dl>
<dt><dfn attribute for=MediaStreamTrackProcessor>`[[track]]`</dfn></dt>
<dd>Track whose raw data is to be exposed by the {{MediaStreamTrackProcessor}}.</dd>
<dt><dfn attribute for=MediaStreamTrackProcessor>`[[maxBufferSize]]`</dfn></dt>
<dd>The maximum number of media frames to be buffered by the {{MediaStreamTrackProcessor}}
as specified by the application. It may have no value if the application does
not provide it. Its minimum valid value is 1.</dd>
<dt><dfn attribute for=MediaStreamTrackProcessor>`[[queue]]`</dfn></dt>
<dd>A [=queue=] used to buffer media frames not yet read by the application</dd>
<dt><dfn attribute for=MediaStreamTrackProcessor>`[[numPendingReads]]`</dfn></dt>
<dd>An integer whose value represents the number of read requests issued by the
application that have not yet been handled.
</dd>
<dt><dfn attribute for=MediaStreamTrackProcessor>`[[isClosed]]`</dfn></dt>
<dd>An boolean whose value indicates if the {{MediaStreamTrackProcessor}} is closed.
</dd>
</dl>
### Constructor ### {#constructor-processor}
<dfn constructor for=MediaStreamTrackProcessor title="MediaStreamTrackProcessor(init)">
MediaStreamTrackProcessor(|init|)
</dfn>
1. If |init|.{{MediaStreamTrackProcessorInit/track}} is not a valid {{MediaStreamTrack}},
throw a {{TypeError}}.
1. Let |maxBufferSize| be 1.
1. If |init|.{{MediaStreamTrackProcessorInit/maxBufferSize}} has an integer value greater than 1, run the following substeps:
1. Set |maxBufferSize| to |init|.{{MediaStreamTrackProcessorInit/maxBufferSize}}.
1. The user agent MAY decide to clamp |maxBufferSize| to a lower value, but no lower than 1.
<p class="note">
Clamping |maxBufferSize| can be useful for some sources like cameras, for instance in case
they can only use a limited number of VideoFrames at any given time.
</p>
1. Let |processor| be a new {{MediaStreamTrackProcessor}} object.
1. Set |processor|.`[[track]]` to |init|.{{MediaStreamTrackProcessorInit/track}}.
1. Set |processor|.`[[maxBufferSize]]` to |maxBufferSize|.
1. Set |processor|.`[[queue]]` to an empty [=queue=].
1. Set |processor|.`[[numPendingReads]]` to 0.
1. Set |processor|.`[[isClosed]]` to false.
1. Return |processor|.
### Attributes ### {#attributes-processor}
<dl>
<dt><dfn attribute for=MediaStreamTrackProcessor>readable</dfn></dt>
<dd>Allows reading the frames delivered by the {{MediaStreamTrack}} stored
in the `[[track]]` internal slot. This attribute is created the first time it is invoked
according to the following steps:
1. Initialize [=this=].{{MediaStreamTrackProcessor/readable}} to be a new {{ReadableStream}}.
2. <a dfn for="ReadableStream">Set up</a> [=this=].{{MediaStreamTrackProcessor/readable}} with its [=ReadableStream/set up/pullAlgorithm=] set to [=processorPull=] with [=this=] as parameter, [=ReadableStream/set up/cancelAlgorithm=] set to [=processorCancel=] with [=this=] as parameter, and [=ReadableStream/set up/highWatermark=] set to 0.
The <dfn>processorPull</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. Increment the value of the |processor|.`[[numPendingReads]]` by 1.
2. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.
3. Return [=a promise resolved with=] undefined.
The <dfn>maybeReadFrame</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. If |processor|.`[[queue]]` is [=queue/empty=], abort these steps.
1. If |processor|.`[[numPendingReads]]` equals zero, abort these steps.
1. Let |frame| be the result of [=queue/dequeueing=] a frame media data from |processor|.`[[queue]]`.
1. [=ReadableStream/Enqueue=] |frame| in |processor|.{{MediaStreamTrackProcessor/readable}}.
1. Decrement |processor|.`[[numPendingReads]]` by 1.
1. Go to step 1.
The <dfn>processorCancel</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. Run the [=processorClose=] algorithm with |processor| as parameter.
3. Return [=a promise resolved with=] undefined.
The <dfn>processorClose</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[isClosed]]` is true, abort these steps.
2. Disconnect |processor| from |processor|.`[[track]]`. The mechanism to do this is UA specific and the result is that |processor| is no longer a sink of |processor|.`[[track]]`.
3. [$ReadableStreamDefaultControllerClose|Close$] |processor|.{{MediaStreamTrackProcessor/readable}}.[=ReadableStream/[[controller]]=].
4. [=list/Empty=] |processor|.`[[queue]]`.
5. Set |processor|.`[[isClosed]]` to true.
</dd>
</dl>
### Handling interaction with the track ### {#processor-handling-interaction-with-track}
When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor| delivers a
frame to |processor|, the UA MUST execute the [=handleNewFrame=] algorithm
with |processor| as parameter.
The <dfn>handleNewFrame</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[queue]]` has |processor|.`[[maxBufferSize]]` elements, run the following steps:
1. Let |droppedFrame| be the result of [=queue/dequeueing=] |processor|.`[[queue]]`.
1. Run the [=Close VideoFrame=] algorithm with |droppedFrame|.
2. [=queue/Enqueue=] the new frame media data in |processor|.`[[queue]]`.
3. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.
At any time, the UA MAY [=list/remove=] any frame from |processor|.`[[queue]]`.
The UA may decide to remove frames from |processor|.`[[queue]]`, for example,
to prevent resource exhaustion or to improve performance in certain situations.
</dd>
<p class="note">
The application may detect that frames have been dropped by noticing that there
is a gap in the timestamps of the frames.
</p>
</dl>
When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor|
[=track|ends=], the [=processorClose=] algorithm must be
executed with |processor| as parameter.
## VideoTrackGenerator ## {#video-track-generator}
A {{VideoTrackGenerator}} allows the creation of a video source for a
{{MediaStreamTrack}} in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a> that generates its frames from a Stream of {{VideoFrame}} objects. It has two readonly
attributes: a {{VideoTrackGenerator/writable}} {{WritableStream}} and a
{{VideoTrackGenerator/track}} {{MediaStreamTrack}}.
The {{VideoTrackGenerator}} is the underlying sink] of its
{{VideoTrackGenerator/writable}} attribute. The {{VideoTrackGenerator/track}} attribute
is the output. Further tracks connected to the same {{VideoTrackGenerator}} can be
created using the {{MediaStreamTrack/clone}} method on the
{{VideoTrackGenerator/track}} attribute.
The {{WritableStream}} accepts {{VideoFrame}} objects.
When a {{VideoFrame}} is written to {{VideoTrackGenerator/writable}},
the frame's `close()` method is automatically invoked, so that its internal
resources are no longer accessible from JavaScript.
Note: There is consensus in the WG that a source capable of generating a MediaStreamTrack of kind "video" should exist.
There is no WG consensus on whether or not a source capable of generating a MediaStreamTrack of kind "audio" should exist.
### Interface definition ### {#video-generator-interface}
<pre class="idl">
[Exposed=DedicatedWorker]
interface VideoTrackGenerator {
constructor();
readonly attribute WritableStream writable;
attribute boolean muted;
readonly attribute MediaStreamTrack track;
};
</pre>
Note: There is WG consensus that this interface should be exposed on DedicatedWorker.
There is no WG consensus on whether or not it should be exposed on Window.
### Internal slots ### {#internal-slots}
<dl>
<dt><dfn attribute for=VideoTrackGenerator>`[[track]]`</dfn></dt>
<dd>The {{MediaStreamTrack}} output of this source</dd>
<dt><dfn attribute for=VideoTrackGenerator>`[[isMuted]]`</dfn></dt>
<dd>A boolean whose value indicates whether this source and all the
{{MediaStreamTrack}}s it sources, are currently {{MediaStreamTrack/muted}} or not.
</dd>
</dl>
### Constructor ### {#video-generator-constructor}
<dfn constructor for=VideoTrackGenerator title="VideoTrackGenerator(init)">
VideoTrackGenerator()
</dfn>
1. Let |generator| be a new {{VideoTrackGenerator}} object.
1. Let |track| be a newly [$create a MediaStreamTrack|created$] {{MediaStreamTrack}} with <var>source</var> set to |generator| and <var>tieSourceToContext</var> set to <code>false</code>.
1. Initialize |generator|.{{VideoTrackGenerator/track}} to |track|.
4. Return |generator|.
### Attributes ### {#video-generator-attributes}
<dl>
<dt><dfn attribute for=VideoTrackGenerator>writable</dfn></dt>
<dd>Allows writing video frames to the {{VideoTrackGenerator}}. When this attribute
is accessed for the first time, it MUST be initialized with the following steps:
1. Initialize [=this=].{{VideoTrackGenerator/writable}} to be a new {{WritableStream}}.
2. <a dfn for="WritableStream">Set up</a> [=this=].{{VideoTrackGenerator/writable}}, with its [=WritableStream/set up/writeAlgorithm=] set to [=writeFrame=] with |this| as parameter, with [=WritableStream/set up/closeAlgorithm=] set to [=closeWritable=] with |this| as parameter and [=WritableStream/set up/abortAlgorithm=] set to [=closeWritable=] with |this| as parameter.
The <dfn>writeFrame</dfn> algorithm is given a |generator| and a |frame| as input. It is defined by running the following steps:
1. If |frame| is not a {{VideoFrame}} object, return [=a promise rejected with=] a {{TypeError}}.
1. If the value of |frame|’s {{platform object/[[Detached]]}} internal slot is true, return [=a promise rejected with=] a {{TypeError}}.
1. If |generator|.`[[isMuted]]` is false, for each live track sourced from |generator|, named |track|, run the following steps:
1. Let |clone| be the result of running the [=Clone videoFrame=] algorithm with |frame|.
1. Send |clone| to |track|.
1. Run the [=Close VideoFrame=] algorithm with |frame|.
1. Return [=a promise resolved with=] undefined.
<p class="note">
When the media data is sent to a track, the UA may apply processing
(e.g., cropping and downscaling) to ensure that the media data sent
to the track satisfies the track's constraints. Each track may receive a
different version of the media data depending on its constraints.
</p>
The <dfn>closeWritable</dfn> algorithm is given a |generator| as input.
It is defined by running the following steps.
1. For each track `t` sourced from |generator|, [=track|end=] `t`.
2. Return [=a promise resolved with=] undefined.
</dd>
<dt><dfn attribute for=VideoTrackGenerator>muted</dfn></dt>
<dd>Mutes the {{VideoTrackGenerator}}. The getter steps are to return
[=this=].`[[isMuted]]`. The setter steps, given a value |newValue|, are as follows:
1. If |newValue| is equal to [=this=].`[[isMuted]]`, abort these steps.
1. Set [=this=].`[[isMuted]]` to |newValue|.
1. Unless one has been queued already this run of the event loop, [=queue a task=] to run the following steps:
1. Let |settledValue| be [=this=].`[[isMuted]]`.
1. For each live track sourced by [=this=], [=queue a task=] to [$set a track's muted state$] to |settledValue|.
</dd>
<dt><dfn attribute for=VideoTrackGenerator>track</dfn></dt>
<dd>The {{MediaStreamTrack}} output. The getter steps are to return
[=this=].`[[track]]`.
</dd>
</dl>
### Specialization of MediaStreamTrack behavior ### {#video-generator-as-track}
A {{VideoTrackGenerator}} acts as the source for one or more {{MediaStreamTrack}}s.
This section adds clarifications on how a {{MediaStreamTrack}} sourced from a
{{VideoTrackGenerator}} behaves.
#### stop #### {#video-generator-stop}
The {{MediaStreamTrack/stop}} method stops the track. When the last track
sourced from a {{VideoTrackGenerator}} ends, that {{VideoTrackGenerator}}'s
{{VideoTrackGenerator/writable}} is [=WritableStream/closing|closed=].
#### Constrainable properties #### {#generator-constrainable-properties}
The following constrainable properties are defined for any {{MediaStreamTrack}}s sourced from
a {{VideoTrackGenerator}}:
<table>
<thead>
<tr>
<th>
Property Name
</th>
<th>
Values
</th>
<th>
Notes
</th>
</tr>
</thead>
<tbody>
<tr id="def-constraint-width">
<td>
width
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the width, in pixels, of the latest
frame received by the track.
As a capability, `max` MUST reflect the
largest width a {{VideoFrame}} may have, and `min`
MUST reflect the smallest width a {{VideoFrame}} may have.
</td>
</tr>
<tr id="def-constraint-height">
<td>
height
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the height, in pixels, of the latest
frame received by the track.
As a capability, `max` MUST reflect the largest height
a {{VideoFrame}} may have, and `min` MUST reflect
the smallest height a {{VideoFrame}} may have.
</td>
</tr>
<tr id="def-constraint-frameRate">
<td>
frameRate
</td>
<td>
{{ConstrainDouble}}
</td>
<td>
As a setting, this is an estimate of the frame rate based on frames
recently received by the track.
As a capability `min` MUST be zero and
`max` MUST be the maximum frame rate supported by the system.
</td>
</tr>
<tr id="def-constraint-aspect">
<td>
aspectRatio
</td>
<td>
{{ConstrainDouble}}
</td>
<td>
As a setting, this is the aspect ratio of the latest frame
delivered by the track;
this is the width in pixels divided by height in pixels as a
double rounded to the tenth decimal place. As a capability,
`min` MUST be the
smallest aspect ratio supported by a {{VideoFrame}}, and `max` MUST be
the largest aspect ratio supported by a {{VideoFrame}}.
</td>
</tr>
<tr id="def-constraint-resizeMode">
<td>
resizeMode
</td>
<td>
{{ConstrainDOMString}}
</td>
<td>
As a setting, this string should be one of the members of
{{VideoResizeModeEnum}}. The value "{{VideoResizeModeEnum/none}}"
means that the frames output by the MediaStreamTrack are unmodified
versions of the frames written to the
{{VideoTrackGenerator/writable}} backing
the track, regardless of any constraints.
The value "{{VideoResizeModeEnum/crop-and-scale}}" means
that the frames output by the MediaStreamTrack may be cropped and/or
downscaled versions
of the source frames, based on the values of the width, height and
aspectRatio constraints of the track.
As a capability, the values "{{VideoResizeModeEnum/none}}" and
"{{VideoResizeModeEnum/crop-and-scale}}" both MUST be present.
</td>
</tr>
</tbody>
</table>
The {{MediaStreamTrack/applyConstraints}} method applied to a video {{MediaStreamTrack}}
sourced from a {{VideoTrackGenerator}} supports the properties defined above.
It can be used, for example, to resize frames or adjust the frame rate of the track.
Note that these constraints have no effect on the {{VideoFrame}} objects
written to the {{VideoTrackGenerator/writable}} of a {{VideoTrackGenerator}},
just on the output of the track on which the constraints have been applied.
Note also that, since a {{VideoTrackGenerator}} can in principle produce
media data with any setting for the supported constrainable properties,
an {{MediaStreamTrack/applyConstraints}} call on a track
backed by a {{VideoTrackGenerator}} will generally not fail with
{{OverconstrainedError}} unless the given constraints
are outside the system-supported range, as reported by
{{MediaStreamTrack/getCapabilities}}.
#### Events and attributes #### {#generator-events-attributes}
Events and attributes work the same as for any {{MediaStreamTrack}}.
It is relevant to note that if the {{VideoTrackGenerator/writable}}
stream of a {{VideoTrackGenerator}} is closed, all the live
tracks connected to it are ended and the `ended` event is fired on them.
# Examples # {#examples}
## Video Processing ## {#video-processing}
Consider a face recognition function `detectFace(videoFrame)` that returns a face position
(in some format), and a manipulation function `blurBackground(videoFrame, facePosition)` that
returns a new VideoFrame similar to the given `videoFrame`, but with the
non-face parts blurred. The example also shows the video before and after
effects on video elements.
<pre class="example" highlight="js">
// main.js
const stream = await navigator.mediaDevices.getUserMedia({video:true});
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream.clone();
const [track] = stream.getVideoTracks();
const worker = new Worker('worker.js');
worker.postMessage({track}, [track]);
const {data} = await new Promise(r => worker.onmessage);
videoAfter.srcObject = new MediaStream([data.track]);
// worker.js
self.onmessage = async ({data: {track}}) => {
const source = new VideoTrackGenerator();
parent.postMessage({track: source.track}, [source.track]);
const {readable} = new MediaStreamTrackProcessor({track});
const transformer = new TransformStream({
async transform(frame, controller) {
const facePosition = await detectFace(frame);
const newFrame = blurBackground(frame, facePosition);
frame.close();
controller.enqueue(newFrame);
}
});
await readable.pipeThrough(transformer).pipeTo(source.writable);
};
</pre>
## Multi-consumer post-processing with constraints ## {#multi-consumer-constraints}
A common use case is to remove the background from live camera video fed into a
video conference, with a live self-view showing the result. It's desirable for
the self-view to have a high frame rate even if the frame rate used for actual
sending may dip lower due to back pressure from bandwidth constraints. This can
be achieved by applying constraints to a track clone, avoiding having to process
twice.
<pre class="example" highlight="js">
// main.js
const stream = await navigator.mediaDevices.getUserMedia({video:true});
const [track] = stream.getVideoTracks();
const worker = new Worker('worker.js');
worker.postMessage({track}, [track]);
const {data} = await new Promise(r => worker.onmessage);
const selfView = document.getElementById('video-self');
selfView.srcObject = new MediaStream([data.track.clone()]); // 60 fps
await data.track.applyConstraints({width: 320, height: 200, frameRate: 30});
const pc = new RTCPeerConnection(config);
pc.addTrack(data.track); // 30 fps
// worker.js
self.onmessage = async ({data: {track}}) => {
const source = new VideoTrackGenerator();
parent.postMessage({track: source.track}, [source.track]);
const {readable} = new MediaStreamTrackProcessor({track});
const transformer = new TransformStream({transform: myRemoveBackgroundFromVideo});
await readable.pipeThrough(transformer).pipeTo(source.writable);
};
</pre>
## Multi-consumer post-processing with constraints in a worker ## {#multi-consumer-worker}
Being able to show a higher frame-rate self-view is also relevant when sending
video frames over WebTransport in a worker. The same technique above may be used
here, except constraints are applied to a track clone in the worker.
<pre class="example" highlight="js">
// main.js
const stream = await navigator.mediaDevices.getUserMedia({video:true});
const [track] = stream.getVideoTracks();
const worker = new Worker('worker.js');
worker.postMessage({track}, [track]);
const {data} = await new Promise(r => worker.onmessage);
const selfView = document.getElementById('video-self');
selfView.srcObject = new MediaStream([data.track]); // 60 fps
// worker.js
self.onmessage = async ({data: {track}}) => {
const source = new VideoTrackGenerator();
const sendTrack = source.track.clone();
parent.postMessage({track: source.track}, [source.track]);
await sendTrack.applyConstraints({width: 320, height: 200, frameRate: 30});
const wt = new WebTransport("https://webtransport.org:8080/up");
const {readable} = new MediaStreamTrackProcessor({track});
const transformer = new TransformStream({transform: myRemoveBackgroundFromVideo});
await readable.pipeThrough(transformer)
.pipeThrough({writable: source.writable, readable: sendTrack.readable}),
.pipeThrough(createMyEncodeVideoStream({
codec: "vp8",
width: 640,
height: 480,
bitrate: 1000000,
}))
.pipeThrough(new TransformStream({transform: mySerializer}));
.pipeTo(wt.createUnidirectionalStream()); // 30 fps
};
</pre>
<div class="note">
<p>The above example avoids using the `tee()` function to serve multiple
consumers, due to its issues with real-time streams.</p>
<p>For brevity, the example also over-simplifies using a WebCodecs wrapper to
encode and send video frames over a single WebTransport stream (incurring
head-of-line blocking).</p>
</div>
# Implementation advice # {#implementation-advice}
This section is informative.
## Use with multiple consumers ## {#multi-consumers}
There are use cases where the programmer may desire that a single stream of frames
is consumed by multiple consumers.
Examples include the case where the result of a background blurring function should
be both displayed in a self-view and encoded using a {{VideoEncoder}}.
For cases where both consumers are consuming unprocessed frames, and synchronization
is not desired, instantianting multiple {{MediaStreamTrackProcessor}} objects is a robust solution.
For cases where both consumers intend to convert the result of a processing step into a
{{MediaStreamTrack}}
using a {{VideoTrackGenerator}}, for example when feeding a processed stream
to both a <video> tag and an {{RTCPeerConnection}}, attaching the resulting {{MediaStreamTrack}}
to multiple sinks may be the most appropriate mechanism.
For cases where the downstream processing takes frames, not streams, the frames can
be cloned as needed and sent off to the downstream processing; "clone" is a cheap operation.
When the stream is the output of some processing, and both branches need a Stream object
to do further processing, one needs a function that produces two streams from one stream.
However, the standard tee() operation is problematic
in this context:
* It defeats the backpressure mechanism that guards against excessive queueing
* It creates multiple links to the same buffers, meaning that the question of which
consumer gets to destroy() the buffer is a difficult one to address
Therefore, the use of tee() with Streams containing media should only be done when
fully understanding the implications. Instead, custom elements for splitting streams
more appropriate to the use case should be used.
* If both branches require the ability to dispose of the frames, clone() the frame
and enqueue distinct copies in both queues. This corresponds to the function
ReadableStreamTee(stream, cloneForBranch2=true). Then choose one of the
alternatives below.
* If one branch requires all frames, and the other branch tolerates dropped frames,
enqueue buffers in the all-frames-required stream and use the backpressure signal
from that stream to stop reading from the source. If backpressure signal from the
other stream indicates room, enqueue the same frame in that queue too.
* If neither stream tolerates dropped frames, use the combined backpressure signal
to stop reading from the source. In this case, frames will be processed in
lockstep if the buffer sizes are both 1.
* If it is OK for the incoming stream to be stalled only when the underlying
buffer pool allocated to the process is exhausted, standard tee() may be used.
Note: There are issues filed on the Streams spec where the resolution might affect this section: https://github.com/whatwg/streams/issues/1157, https://github.com/whatwg/streams/issues/1156, https://github.com/whatwg/streams/issues/401, https://github.com/whatwg/streams/issues/1186
# Security and Privacy considerations # {#security-considerations}
This API defines a {{MediaStreamTrack}} source and a {{MediaStreamTrack}} sink.
The security and privacy of the source ({{VideoTrackGenerator}}) relies
on the same-origin policy. That is, the data {{VideoTrackGenerator}} can
make available in the form of a {{MediaStreamTrack}} must be visible to
the document before a {{VideoFrame}} object can be constructed
and pushed into the {{VideoTrackGenerator}}. Any attempt to create
{{VideoFrame}} objects using cross-origin data will fail.
Therefore, {{VideoTrackGenerator}} does not introduce any new
fingerprinting surface.
The {{MediaStreamTrack}} sink introduced by this API ({{MediaStreamTrackProcessor}})
exposes {{MediaStreamTrack}} the same data that is exposed by other
{{MediaStreamTrack}} sinks such as WebRTC peer connections, and media elements. The security and privacy
of {{MediaStreamTrackProcessor}} relies on the security and privacy of the
{{MediaStreamTrack}} sources of the tracks to which {{MediaStreamTrackProcessor}}
is connected. For example, camera, microphone and screen-capture tracks
rely on explicit use authorization via permission dialogs (see
[[MEDIACAPTURE-STREAMS]] and [[SCREEN-CAPTURE]]),
while element capture and {{VideoTrackGenerator}}
rely on the same-origin policy.
A potential issue with {{MediaStreamTrackProcessor}} is resource exhaustion.
For example, a site might hold on to too many open {{VideoFrame}} objects
and deplete a system-wide pool of GPU-memory-backed frames. UAs can
mitigate this risk by limiting the number of pool-backed frames a site can
hold. This can be achieved by reducing the maximum number of buffered frames
and by refusing to deliver more frames to {{MediaStreamTrackProcessor/readable}}
once the budget limit is reached. Accidental exhaustion is also mitigated by
automatic closing of {{VideoFrame}} objects once they
are written to a {{VideoTrackGenerator}}.
# Backwards compatibility with earlier proposals # {#backwards-compatibility}
This section is informative.
Previous proposals for this interface had an API like this:
<div class="example">
<pre class="idl">
[Exposed=Window,DedicatedWorker]
interface MediaStreamTrackGenerator : MediaStreamTrack {
constructor(MediaStreamTrackGeneratorInit init);
attribute WritableStream writable; // VideoFrame or AudioData
};
dictionary MediaStreamTrackGeneratorInit {
required DOMString kind;
};
</pre>
</div>
This interface had the generator for the MediaStreamTrack being an instance of
a MediaStreamTrack rather than containing one.
The VideoTrackGenerator can be shimmed on top of MediaStreamTrackGenerator like this:
<pre class="example">
// Not tested, unlikely to work as written!
class VideoTrackGenerator {
constructor() {
this.innerGenerator = new MediaStreamTrackGenerator({kind: 'video'});
this.writable = this.innerGenerator.writable;
this.track = this.innerGenerator.clone();
}
// Missing: shim for setting of the "muted" attribute.
};
</pre>
Further description of the previous proposals, including considerations involving
processing of audio, can be found in earlier versions of this document.
Note: A link will be placed here pointing to the chrome-96 branch when
we have finished moving repos about.