-
Notifications
You must be signed in to change notification settings - Fork 29
/
index.bs
6285 lines (4712 loc) · 243 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: WebVTT: The Web Video Text Tracks Format
H1: WebVTT: The Web Video Text Tracks Format
Shortname: webvtt1
Status: CG-DRAFT
Default Ref Status: current
Prepare For TR: false
Group: texttracks
ED: https://w3c.github.io/webvtt/
TR: https://www.w3.org/TR/webvtt1/
Level: none
Editor: Gary Katsevman, Mux Inc. https://www.mux.com/, [email protected]
Former Editor: Silvia Pfeiffer, NICTA CSIRO https://www.csiro.au/, [email protected]
Former Editor: Simon Pieters, Opera Software AS http://www.opera.com/, [email protected]
Former Editor: Philip Jägenstedt, Opera Software ASA http://www.opera.com/, [email protected]
Former Editor: Ian Hickson, Google http://www.google.com/, [email protected]
!Participate: <a href=https://github.com/w3c/webvtt>GitHub w3c/webvtt</a> (<a href=https://github.com/w3c/webvtt/issues/new>new issue</a>, <a href=https://github.com/w3c/webvtt/issues>open issues</a>, <a href=https://www.w3.org/Bugs/Public/buglist.cgi?product=TextTracks%20CG&component=WebVTT&resolution=--->legacy open bugs</a>)
!Commits: <a href=https://github.com/w3c/webvtt/commits>GitHub w3c/webvtt/commits</a>
!Commits: <a href=https://twitter.com/webvtt>@webvtt</a>
Test Suite: https://github.com/web-platform-tests/wpt/tree/master/webvtt
Abstract: This specification defines WebVTT, the Web Video Text Tracks format. Its main use is for marking up external text track resources in connection with the HTML <track> element.
Abstract: WebVTT files provide captions or subtitles for video content, and also text video descriptions [[MAUR]], chapters for content navigation, and more generally any form of metadata that is time-aligned with audio or video content.
Boilerplate: omit conformance, omit feedback-header
Ignored Terms: unicode-bidi, color, text-combine-upright, text-wrap, lang, class, title
Ignored Vars: seconds-frac, selector, fragment, seen cue
</pre>
<pre class=anchors>
urlPrefix: https://dom.spec.whatwg.org/
type: dfn
text: namespaceURI
urlPrefix: https://html.spec.whatwg.org/multipage/
type: dfn
urlPrefix: infrastructure.html
text: ascii digits
text: split a string on spaces
text: skip whitespace
text: alphanumeric ascii characters
text: space character
text: case-sensitive
urlPrefix: embedded-content.html
text: text track kind
text: text track cue
text: text track list of cues
text: text track
text: list of text tracks
text: media element
text: text track mode
text: text track showing
text: rules for updating the text track rendering
text: text track cue active flag
text: text track cue display state
text: current playback position
text: text track cue identifier
text: text track cue pause-on-exit flag
text: rules for extracting the chapter title
text: text track cue start time
text: text track cue end time
text: unbounded text track cue
text: expose a user interface to the user
text: text track cue order
text: honor user preferences for automatic text track selection
urlPrefix: webappapis.html
text: entry settings object
urlPrefix: syntax.html
text: character references; url: #syntax-charref
text: additional allowed character
text: consume a character reference
type: element-attr
urlPrefix: dom.html
text: title; url: #attr-title
text: lang; url: #attr-lang
text: class; url: #classes
urlPrefix: https://encoding.spec.whatwg.org/
type: dfn
text: utf-8 decode
</pre>
<pre class=link-defaults>
spec:dom; type:interface; text:Document
spec:css-ruby-1; type:value; text:ruby-base
spec:css-color-4; type:property; text:color
spec:css-fonts-3; type:property; text:font-style
spec:css-fonts-3; type:property; text:font-weight
spec:css-ruby-1; type:value; text:ruby
spec:css-ruby-1; type:value; text:ruby-text
spec:css2; type:selector; text::lang()
spec:css-flexbox-1; type:value; text:inline-flex
spec:selectors-3; type:selector; text:::before
spec:selectors-3; type:selector; text:::after
spec:css-display-3; type:property; text:display
spec:html; type:element; text:style
spec:css-align-3; type:value; for:justify-content; text:flex-end
</pre>
<pre class=biblio>
{
"MAUR": {
"authors": [ "Shane McCarron", "Michael Cooper", "Mark Sadecki" ],
"href": "http://www.w3.org/TR/media-accessibility-reqs/",
"title": "Media Accessibility User Requirements",
"status": "WD",
"publisher": "W3C"
}
}
</pre>
<style>
samp {
font-family: inherit;
background-color: black; /* fallback if rgba() is not supported */
background-color: rgba(0, 0, 0, 0.7);
outline: 0.18em solid rgba(0, 0, 0, 0.7);
color: white;
}
[data-algorithm]:not(.heading) {
padding-left: 2em;
}
</style>
<h2 id=introduction>Introduction</h2>
<p><i>This section is non-normative.</i></p>
<p>The <dfn>WebVTT</dfn> (Web Video Text Tracks) format is intended for marking up external text
track resources in connection with the HTML <track> element.</p>
<p>WebVTT files provide captions or subtitles for video content, and also text video descriptions
[[MAUR]], chapters for content navigation, and more generally any form of metadata that is
time-aligned with audio or video content.</p>
<p>The majority of the current version of this specification is dedicated to describing how to use
WebVTT files for captioning or subtitling. There is minimal information about chapters and
time-aligned metadata and nothing about video descriptions at this stage.</p>
<p>In this section we provide some example WebVTT files as an introduction.</p>
<h3 id=introduction-caption>A simple caption file</h3>
<p><i>This section is non-normative.</i></p>
<p>The main use for WebVTT files is captioning or subtitling video content. Here is a sample file
that captions an interview:</p>
<pre>
WEBVTT
00:11.000 --> 00:13.000
<v Roger Bingham>We are in New York City
00:13.000 --> 00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street
00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History
00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson
00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium
00:22.000 --> 00:24.000
<v Roger Bingham>at the AMNH.
00:24.000 --> 00:26.000
<v Roger Bingham>Thank you for walking down here.
00:27.000 --> 00:30.000
<v Roger Bingham>And I want to do a follow-up on the last conversation we did.
00:30.000 --> 00:31.500 align:right size:50%
<v Roger Bingham>When we e-mailed—
00:30.500 --> 00:32.500 align:left size:50%
<v Neil deGrasse Tyson>Didn't we talk about enough in that conversation?
00:32.000 --> 00:35.500 align:right size:50%
<v Roger Bingham>No! No no no no; 'cos 'cos obviously 'cos
00:32.500 --> 00:33.500 align:left size:50%
<v Neil deGrasse Tyson><i>Laughs</i>
00:35.500 --> 00:38.000
<v Roger Bingham>You know I'm so excited my glasses are falling off here.
</pre>
<p>You can see that a WebVTT file in general consists of a sequence of text segments associated with
a time-interval, called a cue (<a lt="WebVTT cue">definition</a>). Beyond captioning and subtitling,
WebVTT can be used for time-aligned metadata, typically in use for delivering name-value pairs in
cues. WebVTT can also be used for delivering chapters, which helps with contextual navigation around
an audio/video file. Finally, WebVTT can be used for the delivery of text video descriptions, which
is text that describes the visual content of time-intervals and can be synthesized to speech to help
vision-impaired users understand context.</p>
<p class=note>This version of WebVTT focuses on solving the captioning and subtitling use cases.
More specification work is possible for the other use cases. A decision on what type of use case a
WebVTT file is being used for is made by the software that is using the file. For example, if in use
with a HTML file through a <track> element, the <a lt="text track kind">kind</a> attribute
defines how the WebVTT file is to be interpreted.</p>
<p>The following subsections provide an overview of some of the key features of the WebVTT file
format, particularly when in use for captioning and subtitling.</p>
<h3 id=introduction-multiple-lines>Caption cues with multiple lines</h3>
<p><i>This section is non-normative.</i></p>
<p>Line breaks in cues are honored. User agents will also insert extra line breaks if necessary to
fit the cue in the cue's width. In general, therefore, authors are encouraged to write cues all on
one line except when a line break is definitely necessary.</p>
<div class="example">
<p>These captions on a public service announcement video demonstrate line breaking:</p>
<pre>
WEBVTT
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
00:10.000 --> 00:14.000
The Organisation for Sample Public Service Announcements accepts no liability for the content of this advertisement, or for the consequences of any actions taken on the basis of the information provided.
</pre>
<p>The first cue is simple, it will probably just display on one line. The second will take two
lines, one for each speaker. The third will wrap to fit the width of the video, possibly taking
multiple lines. For example, the three cues could look like this:</p>
<!-- 50 -->
<pre>
<samp>Never drink liquid nitrogen.</samp>
<samp>— It will perforate your stomach.</samp>
<samp>— You could die.</samp>
<samp>The Organisation for Sample Public Service</samp>
<samp>Announcements accepts no liability for the</samp>
<samp>content of this advertisement, or for the</samp>
<samp>consequences of any actions taken on the</samp>
<samp>basis of the information provided.</samp>
</pre>
<p>If the width of the cues is smaller, the first two cues could wrap as well, as in the following
example. Note how the second cue's explicit line break is still honored, however:</p>
<!-- 25 -->
<pre>
<samp>Never drink</samp>
<samp>liquid nitrogen.</samp>
<samp>— It will perforate</samp>
<samp>your stomach.</samp>
<samp>— You could die.</samp>
<samp>The Organisation for</samp>
<samp>Sample Public Service</samp>
<samp>Announcements accepts</samp>
<samp>no liability for the</samp>
<samp>content of this</samp>
<samp>advertisement, or for</samp>
<samp>the consequences of</samp>
<samp>any actions taken on</samp>
<samp>the basis of the</samp>
<samp>information provided.</samp>
</pre>
<p>Also notice how the wrapping is done so as to keep the line lengths balanced.</p>
</div>
<h3 id=styling>Styling captions</h3>
<p><i>This section is non-normative.</i></p>
<p>CSS style sheets that apply to an HTML page that contains a <a element>video</a> element can
target WebVTT cues and regions in the video using the ''::cue'', ''::cue()'', ''::cue-region'' and
''::cue-region()'' pseudo-elements.</p>
<div class="example">
<p>In this example, an HTML page has a CSS style sheet in a <a element>style</a> element that
styles all cues in the video with a gradient background and a text color, as well as changing the
text color for all <a>WebVTT Bold Objects</a> in cues in the video.</p>
<pre>
<!doctype html>
<html>
<head>
<title>Styling WebVTT cues</title>
<style>
video::cue {
background-image: linear-gradient(to bottom, dimgray, lightgray);
color: papayawhip;
}
video::cue(b) {
color: peachpuff;
}
</style>
</head>
<body>
<video controls autoplay src="video.webm">
<track default src="track.vtt">
</video>
</body>
</html>
</pre>
</div>
<p>CSS style sheets can also be embedded in WebVTT files themselves.</p>
<p>Style blocks are placed after any headers but before the first cue, and start with the line
"STYLE". Comment blocks can be interleaved with style blocks.</p>
<p>Blank lines cannot appear in the style sheet. They can be removed or be filled with a space or a
CSS comment (e.g. <code>/**/</code>).</p>
<p>The string "<code>--></code>" cannot be used in the style sheet. If the style sheet is wrapped in
"<code><!--</code>" and "<code>--></code>", then those strings can just be removed. If
"<code>--></code>" appears inside a CSS string, then it can use CSS escaping e.g.
"<code>--\></code>".</p>
<div class="example">
<p>This example shows how cues can be styled with style blocks in WebVTT.</p>
<pre>
WEBVTT
STYLE
::cue {
background-image: linear-gradient(to bottom, dimgray, lightgray);
color: papayawhip;
}
/* Style blocks cannot use blank lines nor "dash dash greater than" */
NOTE comment blocks can be used between style blocks.
STYLE
::cue(b) {
color: peachpuff;
}
hello
00:00:00.000 --> 00:00:10.000
Hello <b>world</b>.
NOTE style blocks cannot appear after the first cue.
</pre>
</div>
<h3 id=introduction-other-features>Other caption and subtitling features</h3>
<p><i>This section is non-normative.</i></p>
<p>WebVTT also supports some less-often used features.</p>
<div class="example">
<p>In this example, the cues have an identifier:</p>
<pre>
WEBVTT
test
00:00.000 --> 00:02.000
This is a test.
123
00:00.000 --> 00:02.000
That's an, an, that's an L!
crédit de transcription
00:04.000 --> 00:05.000
Transcrit par Célestes™
</pre>
<p>This allows a style sheet to specifically target the cues.</p>
<pre>
/* style for cue: test */
::cue(#test) { color: lime; }
</pre>
<p>Due to the syntax rules of CSS, some characters need to be escaped with CSS character escape
sequences. For example, an ID that starts with a number 0-9 needs to be escaped. The ID
<code>123</code> can be represented as "\31 23" (31 refers to the Unicode code point for "1"). See
<a href="https://www.w3.org/International/questions/qa-escapes">Using character escapes in markup
and CSS</a> for more information on CSS escapes.</p>
<pre>
/* style for cue: 123 */
::cue(#\31 23) { color: lime; }
/* style for cue: crédit de transcription */
::cue(#crédit\ de\ transcription) { color: red; }
</pre>
</div>
<div class="example">
<p>This example shows how classes can be used on elements, which can be helpful for localization or
maintainability of styling, and also how to indicate a language change in the cue text.</p>
<pre>
WEBVTT
04:02.500 --> 04:05.000
J'ai commencé le basket à l'âge de 13, 14 ans
04:05.001 --> 04:07.800
Sur les <i.foreignphrase><lang en>playground</lang></i>, ici à Montpellier
</pre>
</div>
<div class="example">
<p>In this example, each cue says who is talking using voice spans. In the first cue, the span
specifying the speaker is also annotated with two classes, "first" and "loud". In the third cue,
there is also some italics text (not associated with a specific speaker). The last cue is annotated
with just the class "loud".</p>
<pre>
WEBVTT
00:00.000 --> 00:02.000
<v.first.loud Esme>It's a blue apple tree!
00:02.000 --> 00:04.000
<v Mary>No way!
00:04.000 --> 00:06.000
<v Esme>Hee!</v> <i>laughter</i>
00:06.000 --> 00:08.000
<v.loud Mary>That's awesome!
</pre>
<p>Notice that as a special exception, the voice spans don't have to be closed if they cover the
entire cue text.</p>
<p>Style sheets can style these spans:</p>
<pre>
::cue(v[voice="Esme"]) { color: cyan }
::cue(v[voice="Mary"]) { color: lime }
::cue(i) { font-style: italic }
::cue(.loud) { font-size: 2em }
</pre>
</div>
<div class="example">
<p>This example shows how to position cues at explicit positions in the video viewport.</p>
<pre>
WEBVTT
00:00:00.000 --> 00:00:04.000 position:10%,line-left align:left size:35%
Where did he go?
00:00:03.000 --> 00:00:06.500 position:90% align:right size:35%
I think he went down this lane.
00:00:04.000 --> 00:00:06.500 position:45%,line-right align:center size:35%
What are you waiting for?
</pre>
<p>Since the cues in these examples are horizontal, the "position" setting refers to a percentage
of the width of the video viewpoint. If the text were vertical, the "position" setting would refer
to the height of the video viewport.</p>
<p>The "line-left" or "line-right" only refers to the physical side of the box to which the
"position" setting applies, in a way which is agnostic regarding the horizontal or vertical
direction of the cue. It does not affect or relate to the direction or position of the text itself
within the box.</p>
<p>The cues cover only 35% of the video viewport's width - that's the <a lt="WebVTT cue box">cue
box</a>'s "size" for all three cues.</p>
<p>The first cue has its <a lt="WebVTT cue box">cue box</a> positioned at the 10% mark. The
"line-left" and "line-right" within the "position" setting indicates which side of the <a
lt="WebVTT cue box">cue box</a> the position refers to. Since in this case the text is horizontal,
"line-left" refers to the left side of the box, and the cue box is thus positioned between the 10%
and the 45% mark of the video viewport's width, probably underneath a speaker on the left of the
video image. If the cue was vertical, "line-left" positioning would be from the top of the video
viewport's height and the <a lt="WebVTT cue box">cue box</a> would cover 35% of the video
viewport's height.</p>
<p>The text within the first cue's cue box is aligned using the "align" cue setting. For
left-to-right rendered text, "start" alignment is the left of that box, for right-to-left rendered
text the right of the box. So, independent of the directionality of the text, it will stay
underneath that speaker. Note that "center" position alignment of the cue box is the default for
start aligned text, in order to avoid having the box move when the base direction of the text
changes (from left-to-right to right-to-left or vice versa) as a result of translation.</p>
<p>The second cue has its <a lt="WebVTT cue box">cue box</a> right aligned at the 90% mark of the
video viewport width ("right" aligned text right aligns the box). The same effect can be achieved
with "position:55%,line-left", which explicitly positions the cue box. The third cue has center
aligned text within the same positioned cue box as the first cue.</p>
</div>
<div class="example">
<p>This example shows two regions containing rollup captions for two different speakers. Fred's
cues scroll up in a region in the left half of the video, Bill's cues scroll up in a region on the
right half of the video. Fred's first cue disappears at 12.5sec even though it is defined until
20sec because its region is limited to 3 lines and at 12.5sec a fourth cue appears:</p>
<pre>
WEBVTT
REGION
id:fred
width:40%
lines:3
regionanchor:0%,100%
viewportanchor:10%,90%
scroll:up
REGION
id:bill
width:40%
lines:3
regionanchor:100%,100%
viewportanchor:90%,90%
scroll:up
00:00:00.000 --> 00:00:20.000 region:fred align:left
<v Fred>Hi, my name is Fred
00:00:02.500 --> 00:00:22.500 region:bill align:right
<v Bill>Hi, I'm Bill
00:00:05.000 --> 00:00:25.000 region:fred align:left
<v Fred>Would you like to get a coffee?
00:00:07.500 --> 00:00:27.500 region:bill align:right
<v Bill>Sure! I've only had one today.
00:00:10.000 --> 00:00:30.000 region:fred align:left
<v Fred>This is my fourth!
00:00:12.500 --> 00:00:32.500 region:fred align:left
<v Fred>OK, let's go.
</pre>
<p>Note that regions are only defined for horizontal cues.</p>
</div>
<h3 id=introduction-comments>Comments in WebVTT</h3>
<p><i>This section is non-normative.</i></p>
<p>Comments can be included in WebVTT files.</p>
<p>Comments are just blocks that are preceded by a blank line, start with the word
"<code>NOTE</code>" (followed by a space or newline), and end at the first blank line.</p>
<div class="example">
<p>Here, a one-line comment is used to note a possible problem with a cue.</p>
<pre>
WEBVTT
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
NOTE I'm not sure the timing is right on the following cue.
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
</pre>
</div>
<div class="example">
<p>In this example, the author has written many comments.</p>
<pre>
WEBVTT
NOTE
This file was written by Jill. I hope
you enjoy reading it. Some things to
bear in mind:
- I was lip-reading, so the cues may
not be 100% accurate
- I didn't pay too close attention to
when the cues should start or end.
00:01.000 --> 00:04.000
Never drink liquid nitrogen.
NOTE check next cue
00:05.000 --> 00:09.000
— It will perforate your stomach.
— You could die.
NOTE end of file
</pre>
</div>
<h3 id=introduction-chapters>Chapters example</h3>
<p><i>This section is non-normative.</i></p>
<p>A WebVTT file can consist of chapters, which are navigation markers for the video.</p>
<p>Chapters are plain text, typically just a single line.</p>
<div class="example">
<p>In this example, a talk is split into each slide being a chapter.</p>
<pre>
WEBVTT
NOTE
This is from a talk Silvia gave about WebVTT.
Slide 1
00:00:00.000 --> 00:00:10.700
Title Slide
Slide 2
00:00:10.700 --> 00:00:47.600
Introduction by Naomi Black
Slide 3
00:00:47.600 --> 00:01:50.100
Impact of Captions on the Web
Slide 4
00:01:50.100 --> 00:03:33.000
Requirements of a Video text format
</pre>
</div>
<h3 id=introduction-metadata>Metadata example</h3>
<p><i>This section is non-normative.</i></p>
<p>A WebVTT file can consist of time-aligned metadata.</p>
<p>Metadata can be any string and is often provided as a JSON construct.</p>
<p>Note that you cannot provide blank lines inside a metadata block, because the blank line
signifies the end of the WebVTT cue.</p>
<div class="example">
<p>In this example, a talk is split into each slide being a chapter.</p>
<pre>
WEBVTT
NOTE
Thanks to http://output.jsbin.com/mugibo
1
00:00:00.100 --> 00:00:07.342
{
"type": "WikipediaPage",
"url": "https://en.wikipedia.org/wiki/Samurai_Pizza_Cats"
}
2
00:07.810 --> 00:09.221
{
"type": "WikipediaPage",
"url" :"http://samuraipizzacats.wikia.com/wiki/Samurai_Pizza_Cats_Wiki"
}
3
00:11.441 --> 00:14.441
{
"type": "LongLat",
"lat" : "36.198269",
"long": "137.2315355"
}
</pre>
</div>
<h2 id=conformance>Conformance</h2>
<p>All diagrams, examples, and notes in this specification are non-normative, as are all sections
explicitly marked non-normative. Everything else in this specification is normative.</p>
<p>The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in RFC2119. The key word "OPTIONALLY" in
the normative parts of this document is to be interpreted with the same normative meaning as "MAY"
and "OPTIONAL". For readability, these words do not appear in all uppercase letters in this
specification. [[!RFC2119]]</p>
<p>Requirements phrased in the imperative as part of algorithms (such as "strip any leading space
characters" or "return false and abort these steps") are to be interpreted with the meaning of the
key word ("must", "should", "may", etc) used in introducing the algorithm.</p>
<p>Conformance requirements phrased as algorithms or specific steps may be implemented in any
manner, so long as the end result is equivalent. (In particular, the algorithms defined in this
specification are intended to be easy to follow, and not intended to be performant.)</p>
<h3 id=conformance-classes>Conformance classes</h3>
<p>This specification describes the conformance criteria for user agents (relevant to implementors)
and <a>WebVTT files</a> (relevant to authors and authoring tool implementors).</p>
<p class=note>[[#syntax]] defines what consists of a valid <a>WebVTT file</a>. Authors need to
follow the requirements therein, and are encouraged to use a conformance checker. [[#parsing]]
defines how user agents are to interpret a file labelled as <a>text/vtt</a>, for both valid and
invalid <a>WebVTT files</a>. The parsing rules are more tolerant to author errors than the syntax
allows, in order to provide for extensibility and to still render cues that have some syntax
errors.</p>
<p class=example>For example, the parser will create two cues even if the blank line between them is
skipped. This is clearly a mistake, so a conformance checker will flag it as an error, but it is
still useful to render the cues to the user.</p>
<p>User agents fall into several (possibly overlapping) categories with different conformance
requirements.</p>
<dl>
<dt>User agents that support scripting</dt>
<dd><p>All processing requirements in this specification apply. The user agent must also be
conforming implementations of the IDL fragments in this specification, as described in the Web IDL
specification. [[!WEBIDL]]</p></dd>
<dt>User agents with no scripting support</dt>
<dd><p>All processing requirements in this specification apply, except those in
[[#dom-construction-rules]] and [[#api]].</p></dd>
<dt><dfn>User agents that do not support CSS</dfn></dt>
<dd><p>All processing requirements in this specification apply, except parts of [[#parsing]] that
relate to stylesheets and CSS, and all of [[#rendering]] and [[#css-extensions]]. The user agent
must instead only render the text inside <a>WebVTT caption or subtitle cue text</a> in an
appropriate manner and specifically support the color classes defined in [[#default-classes]]. Any
other styling instructions are optional.</p> </dd>
<dt><dfn>User agents that do not support a full HTML CSS engine</dfn></dt>
<dd><p>All processing requirements in this specification apply, including the color classes defined
in [[#default-classes]]. However, the user agent will need to apply the CSS related features in
[[#parsing]], [[#rendering]] and [[#css-extensions]] in such a way that the rendered results are
equivalent to what a full CSS supporting renderer produces.</p></dd>
<dt><dfn>User agents that support a full HTML CSS engine</dfn></dt>
<dd><p>All processing requirements in this specification apply. However, only a limited set of CSS
styles is allowed because <a>user agents that do not support a full HTML CSS engine</a> will need
to implement CSS functionality equivalents. User agents that support a full CSS engine must
therefore limit the CSS styles they apply for WebVTT so as to enable identical rendering without
bleeding in extra CSS styles that are beyond the WebVTT specification.</p></dd>
<dt>Conformance checkers</dt>
<dd><p>Conformance checkers must verify that a <a>WebVTT file</a> conforms to the applicable
conformance criteria described in this specification. The term "validator" is equivalent to
conformance checker for the purpose of this specification.</p></dd>
<dt>Authoring tools</dt>
<dd>
<p>Authoring tools must generate conforming <a>WebVTT files</a>. Tools that convert other formats
to <a>WebVTT</a> are also considered to be authoring tools.</p>
<p>When an authoring tool is used to edit a non-conforming <a>WebVTT file</a>, it may preserve the
conformance errors in sections of the file that were not edited during the editing session (i.e.
an editing tool is allowed to round-trip erroneous content). However, an authoring tool must not
claim that the output is conformant if errors have been so preserved.</p>
</dd>
</dl>
<h3 id=unicode-normalization>Unicode normalization</h3>
<p>Implementations of this specification must not normalize Unicode text during processing.</p>
<p class=example>For example, a cue with an identifier consisting of the characters U+0041 LATIN
CAPITAL LETTER A followed by U+030A COMBINING RING ABOVE (a decomposed character sequence), or the
character U+212B ANGSTROM SIGN (a compatibility character), will not match a selector targeting a
cue with an ID consisting of the character U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE (a
precomposed character).</p>
<h2 id=data-model>Data model</h2>
<!-- Describe metadata, caption/subtitle, chapter & description cues -->
<p class=note>The box model of WebVTT consists of three key elements: the video viewport, cues, and
regions. The video viewport is the rendering area into which cues and regions are rendered. Cues are
boxes consisting of a set of cue lines. Regions are subareas of the video viewport that are used to
group cues together. Cues are positioned either inside the video viewport directly or inside a
region, which is positioned inside the video viewport.</p>
<p class=note>The position of a cue inside the video viewport is defined by a set of cue settings.
The position of a region inside the video viewport is defined by a set of region settings. Cues that
are inside regions can only use a limited set of their cue settings. Specifically, if the cue has a
"vertical", "line" or "size" setting, the cue drops out of the region. Otherwise, the cue's width is
calculated to be relative to the region width rather than the viewport. </p>
<h3 id=model-overview>Overview</h3>
<p><i>This section is non-normative.</i></p>
<p>The WebVTT file is a container file for chunks of data that are time-aligned with a video or
audio resource. It can therefore be regarded as a serialisation format for time-aligned data.</p>
<p>A WebVTT file starts with a header and then contains a series of data blocks. If a data block has
a start and end time, it is called a WebVTT cue. A comment is another kind of data block.</p>
<p>Different kinds of data can be carried in WebVTT files. The HTML specification identifies
captions, subtitles, chapters, audio descriptions and metadata as data kinds and specifies which one
is being used in the <a>text track kind</a> attribute of the <a>text track</a> element
[[!HTML]].</p>
<p>A WebVTT file must only contain data of one kind, never a mix of different kinds of data. The
data kind of a WebVTT file is externally specified, such as in a HTML file's <a>text track</a>
element. The environment is responsible for interpreting the data correctly.</p>
<p>WebVTT caption or subtitle cues are rendered as overlays on top of a video viewport or into a
region, which is a subarea of the video viewport.</p>
<h3 id=model-cues>WebVTT cues</h3>
<p>A <dfn>WebVTT cue</dfn> is a <a>text track cue</a> [[!HTML]] that additionally consist of the
following: </p>
<dl>
<dt><dfn lt="cue text">A cue text</dfn></dt>
<dd>
<p>The raw text of the cue, and rules for its interpretation.</p>
</dd>
</dl>
<h3 id=cues>WebVTT caption or subtitle cues</h3>
<p>A <dfn>WebVTT caption or subtitle cue</dfn> is a <a>WebVTT cue</a> that has the following
additional properties allowing the <a>cue text</a> to be rendered and converted to a DOM
fragment:</p>
<dl>
<dt><dfn lt="WebVTT cue box">A cue box</dfn></dt>
<dd>
<p>The cue box of a <a>WebVTT cue</a> is a box within which the text of all lines of the cue is to
be rendered. It is either rendered into the video's viewport or a region inside the viewport if
the cue is part of a region.</p>
<p class="note">The position of the <a lt="WebVTT cue box">cue box</a> within the video viewport's
or region's dimensions depends on the value of the <a>WebVTT cue position</a> and the <a>WebVTT
cue line</a>.</p>
<p class="note">Lines are wrapped within the <a lt="WebVTT cue box">cue box</a>'s <a lt="WebVTT
cue size">size</a> if lines' lengths make this necessary.</p>
</dd>
<dt><dfn lt="WebVTT cue writing direction">A writing direction</dfn></dt>
<dd>
<p>A writing direction, either</p>
<ul>
<li><dfn lt="WebVTT cue horizontal writing direction">horizontal</dfn> (a line extends
horizontally and is offset vertically from the video viewport's top edge, with consecutive lines
displayed below each other),</li>
<li><dfn lt="WebVTT cue vertical growing left writing direction">vertical growing left</dfn> (a
line extends vertically and is offset horizontally from the video viewport's right edge, with
consecutive lines displayed to the left of each other<!-- used for east asian-->), or</li>
<li><dfn lt="WebVTT cue vertical growing right writing direction">vertical growing right</dfn> (a
line extends vertically and is offset horizontally from the video viewport's left edge, with
consecutive lines displayed to the right of each other<!-- used for mongolian -->).</li>
</ul>
<p class=note>The <a lt="WebVTT cue writing direction">writing direction</a> affects the
interpretation of the <a lt="WebVTT cue line">line</a>, <a lt="WebVTT cue position">position</a>,
and <a lt="WebVTT cue size">size</a> cue settings to be interpreted with respect to either the
width or height of the video.</p>
<p>By default, the <a lt="WebVTT cue writing direction">writing direction</a> is set to to <a
lt="WebVTT cue horizontal writing direction">horizontal</a>.</p>
<p class=note>The <a lt="WebVTT cue vertical growing left writing direction">vertical growing
left</a> writing direction could be used for vertical Chinese, Japanese, and Korean, and the <a
lt="WebVTT cue vertical growing right writing direction">vertical growing right</a> writing
direction could be used for vertical Mongolian.</p>
</dd>
<dt><dfn lt="WebVTT cue snap-to-lines flag">A snap-to-lines flag</dfn></dt>
<dd>
<p>A boolean indicating whether the <a lt="WebVTT cue line">line</a> is an integer number of lines
(using the line dimensions of the first line of the cue), or whether it is a percentage of the
dimension of the video. The flag is set to true when lines are counted, and false otherwise.</p>
<p>Cues where the flag is false will be offset as requested modulo overlap avoidance if multiple
cues are in the same place.</p>
<p>By default, the <a lt="WebVTT cue snap-to-lines flag">snap-to-lines flag</a> is set to
true.</p>
</dd>
<dt><dfn lt="WebVTT cue line">A line</dfn></dt>
<dd>
<p>The <a lt="WebVTT cue line">line</a> defines positioning of the <a lt="WebVTT cue box">cue
box</a>.</p>
<p>The <a lt="WebVTT cue line">line</a> offsets the <a lt="WebVTT cue box">cue box</a> from the
top, the right or left of the video viewport as defined by the <a lt="WebVTT cue writing
direction">writing direction</a>, the <a lt="WebVTT cue snap-to-lines flag">snap-to-lines
flag</a>, or the lines occupied by any other showing tracks.</p>
<p>The <a lt="WebVTT cue line">line</a> is set either as a number of lines, a percentage of the
video viewport height or width, or as the special value <dfn lt="WebVTT cue line
automatic">auto</dfn>, which means the offset is to depend on the other showing tracks.</p>
<p>By default, the <a lt="WebVTT cue line">line</a> is set to <a lt="WebVTT cue line
automatic">auto</a>.</p>
<p>If the <a lt="WebVTT cue writing direction">writing direction</a> is <a lt="WebVTT cue
horizontal writing direction">horizontal</a>, then the <a lt="WebVTT cue line">line</a>
percentages are relative to the height of the video, otherwise to the width of the video.</p>
<p>A <a>WebVTT cue</a> has a <dfn lt="cue computed line">computed line</dfn> whose value is that
returned by the following algorithm, which is defined in terms of the other aspects of the
cue:</p>
<ol algorithm="computed line">
<li>
<p>If the <a lt="WebVTT cue line">line</a> is numeric, the <a>WebVTT cue snap-to-lines flag</a>
of the <a>WebVTT cue</a> is false, and the <a lt="WebVTT cue line">line</a> is negative or
greater than 100, then return 100 and abort these steps.</p>
<p class="note">Although the <a>WebVTT parser</a> will not set the <a lt="WebVTT cue
line">line</a> to a number outside the range 0..100 and also set the <a>WebVTT cue snap-to-lines
flag</a> to false, this can happen when using the DOM API's {{VTTCue/snapToLines}} and
{{VTTCue/line}} attributes.</p>
</li>
<li><p>If the <a lt="WebVTT cue line">line</a> is numeric, return the value of the <a>WebVTT cue
line</a> and abort these steps. (Either the <a>WebVTT cue snap-to-lines flag</a> is true, so any
value, not just those in the range 0..100, is valid, or the value is in the range 0..100 and is
thus valid regardless of the value of that flag.)</p></li>
<li><p>If the <a>WebVTT cue snap-to-lines flag</a> of the <a>WebVTT cue</a> is false, return the
value 100 and abort these steps. (The <a lt="WebVTT cue line">line</a> is the special value <a
lt="WebVTT cue line automatic">auto</a>.)</p></li>
<li><p>Let |cue| be the <a>WebVTT cue</a>.</p></li>
<li><p>If |cue| is not in a <a lt="text track list of cues">list of cues</a> of a <a>text
track</a>, or if that <a>text track</a> is not in the <a>list of text tracks</a> of a <a>media
element</a>, return −1 and abort these steps.</p></li>
<li><p>Let |track| be the <a>text track</a> whose <a lt="text track list of cues">list of
cues</a> the |cue| is in.</p></li>
<li><p>Let |n| be the number of <a>text tracks</a> whose <a>text track mode</a> is <a lt="text
track showing">showing</a> and that are in the <a>media element</a>'s <a>list of text tracks</a>
before |track|.</p></li>
<li><p>Increment |n| by one.</p></li>
<li><p>Negate |n|.</p></li>
<li><p>Return |n|.</p></li>
</ol>