## Introducing VTT
WebVTT (Web Video Text Tracks), formerly known as WebSRT, is a W3C community proposal for synchronized video caption playback. It is a time-indexed file format and it is referenced by HTML5 video **and** audio elements.
As with many assistive technologies, it would be a mistake to assume that they are only meant as a way to provide for accessibility accomodations. We can enable captions when the ambient noise is too loud to listen to a recorded presentation, we can use chapters to navigate through a long lecture video just like DVD or Blue Ray movies.
Captions can also improve our movies' discoverability. Google indexes the content of our captions. Both YouTube and Google search can report results based on the video captions available for a given file.
WebVTT files provide open captions, independent of the audio or video files they are attached to, they are not "hard coded" into pixels. This also means that creating VTT files requires nothing more than a text editor; although there are more specialized tools to create the captions.
## Browser support
Based on Silvia Pfeiffer's [post to the VTT community group](http://www.w3.org/community/texttracks/2012/08/23/webvtt-support-in-browsers/) dated August, 2012, and updated with new information about Firefox, the following browsers support VTT tracks for video and audio:
| Browser | Version First Supported | Format Supported | Notes |
| --- | --- | --- | --- |
| Internet Explorer | IE 10 Developer Preview 4 | VTT and TTML |
- [Test Page](http://ie.microsoft.com/testdrive/Graphics/VideoCaptions/)
- [Documentation](http://html5labs.interoperabilitybridges.com/prototypes/video-captioning/video-captioning/info)
- [HTML5 Video Caption Maker](http://ie.microsoft.com/testdrive/Graphics/CaptionMaker/)
- [Timed Text Track Information](http://msdn.microsoft.com/library/ie/bg123962.aspx)
- [Timed Text Tracks](http://samples.msdn.microsoft.com/iedevcenter/TextTrack/default.html) examples
|
| Google Chrome | Version 18 | VTT |
- Basic tutorial hosted at [HTML5 Rocks](http://www.html5rocks.com/en/tutorials/track/basics/)
- Based on Webkit's implementation
|
| Apple Safari | Version 6 | VTT |
- Based on Webkit's implementation
|
| Opera | Since August, 2012 | VTT |
- Documentation at [dev.opera](http://dev.opera.com/articles/view/an-introduction-to-webvtt-and-track/)
|
| Firefox | Nightly | VTT |
- Tested with version 29.0a1 (12/14/2013)
- Feature enabled by default
- See the [Mozilla Developer Documentation](https://developer.mozilla.org/en-US/docs/HTML/WebVTT) for more information
- If the size of the video doesn't match the size attributes of the video tag, the video will display on white/gray background
|
## Polyfills and alternatives
I will use one of the many polyfils available for HTML5 Video Tracks. [Playr](http://www.delphiki.com/html5/playr/) seems to be the most feature complete polyfill for HTML5 video tracks. The downside is 2 more files (one CSS and one JavaScript) to download for the video page but until VTT is widely supported the extra files are worth the effort to create accessible content.
One way to ensure that we only load our polyfill if the browser doesn't support tracks natively is to use Modernizr.load to conditionally load Playr's CSS and JavaScript when the browser does not support HTML5 video tag natively.
```
Modernizr.load([
{
// test whether we support video
test : Modernizr.video,
// Load the corresponding assetts for the polyfill you want to use
// in this case we are using the playr polyfill
nope : ['playr.js', 'playr.css']
},
])
```
The code below uses plain JavaScript to test if a browser supports HTML5 video by creating an empty video element and testing for the video's canPlayType property. It will not load the code for a polyfill like the Modernizr example.
```
var canPlay = false;
var h, plink, pscript;
// Create an empty video element
var v = document.createElement('video');
// If the video can playType and can play MP4 video
if(v.canPlayType) {
// Set canPlay to true
canPlay = true;
// Display an alert telling them so
alert('Can Play HTML5 video')
}
else {
// Append Playr CSS and JS to the head of the page to
// provide a fallback
h = document.getElementsByTagName('head')[0];
plink = document.createElement('link');
plink.setAttribute('href', 'css/playr.css');
plink.setAttribute('media', 'screen');
h.appendChild(plink);
pscript = document.createElement('script');
pscript.setAttribute('src', 'js/playr.js');
h.append('pscript');
}
```
This is the simplest test for video support; a more elaborate version can include support for specific formats and write the `` tags only for the supported formats. The example below makes the following assumptions:
- You have encoded a video in all three formats (webm, mp4 and ogg)
- You are testing for support for HTML5 video in general and specific formats
- If HTML5 video is not supported you have a flash-based fallback solution
```
var canPlay = false;
// Get the video by selecting the video tag
var v = document.getElementsByTagName('video');
// Optionally add video attribtues as needed
// At a minimum set height, width and controls
// as shown below
v.setAttribute('height', '640');
v.setAttribute('width', '480');
v.setAttribute('control', 'control');
// If the video can playType and can play MP4 video
if (v.canPlayType && v.canPlayType('video/webm'; codecs="vp8, vorbis"').replace(/no/, '')){
// append the appropriate source track
var webm = v.appendChild(source);
webm.setAttribute("source", "myvideo.webm");
webm.setAttribute("type", "video/webm");
}
else if (v.canPlayType && v.canPlayType('video/mp4; codecs="avc1.42E01E, mp4a.40.2"').replace(/no/, '')){
// append the appropriate source track
var mp4 = v.appendChild(source);
mp4.setAttribute("source", "myvideo.mp4");
mp4.setAttribute("type", "'video/mp4; codecs="avc1.42E01E, mp4a.40.2"'");
}
```
Also note that we're testing for specific audio and video codec combinations. WebM supports a single combination of video and audio codecs but MP4 supports multiple profiles, not all of which are supported in HTML5 video. See [http://mpeg.chiariglione.org/faq/what-are-different-profiles-supported-mpeg-4-video](http://mpeg.chiariglione.org/faq/what-are-different-profiles-supported-mpeg-4-video) for an introduction to the different profiles supported by MPEG4.
### Players and Polyfills
Playr is by no means the only polyfil or the only player that supports VTT. It is the one that I found the most feature complete for what I needed. The selection below represents a set of players and polyfills available.
- [video.js player](http://videojs.com/docs/tracks/)
- [jwplayer](http://www.longtailvideo.com/addons/plugins/84/Captions)
- [MediaElementJS player](http://mediaelementjs.com/)
- [LeanBack player](http://leanbackplayer.com/)
- [js\_videosub polyfill](http://www.storiesinflight.com/js_videosub/)
- [Captionator polyfill](https://github.com/cgiffard/Captionator)
- [vtt.js](https://github.com/mozilla/vtt.js) by the Mozilla Foundation
## Different types of VTT tracks and their structures
### Captioning Tracks
> Captioning is text that appears on a video, which contains dialogue and audio cues such as music or sound effects that occur off-screen. The purpose of captioning is to make video content accessible to those who are deaf or hard of hearing, and for other situations in which the audio cannot be heard due to noise or a need for silence. Captions can be either open (always visible, aka "burned in") or closed, but closed is more common because it lets each viewer decide whether they want the captions to be turned on or off. From [http://www.cpcweb.com/faq/](http://www.cpcweb.com/faq/)
The simplest and most often used type of text track, captions provide alternative text content for people with visual dissabilities, for people who choose to play the video without audio, and others.
Depending on the player you may have open captions, where the captions are always visible on screen, or closed captions where you have to manually activate the display of captions; Either open or closed, the captions are independent of the content they are attached to.
```
WEBVTT (1)
railroad (2)
00:00:10.000 --> 00:00:12.500 (3) [Optional Settings] (4)
Left uninspired by the crust of railroad earth (5)
manuscript
00:00:13.200 --> 00:00:16.900
that touched the lead to the pages of your manuscript.
```
**Explanation of the cue above:**
1. WEBVTT must be the first item on the file, on the first line and in a line of its own. Optionally there may be lines of metadata. This section must be followed by a blank line
2. The name of the cue. This is also optional
3. Immediately below the name of the cue come the beginning and end time for the cue expressed in hours:minutes:seconds:milliseconds format. **Hours, Minutes and Seconds must have 2 digits and be padded with zeros if necessary. Miliseconds must have 3 digits and be zero padded if not long enough**
4. Optional Cue Settings separated from the time one or more SPACE or TAB characters
5. The text for the cue
### Subtitles Tracks
Subtitle Tracks are similar to Caption Tracks but are not meant to address accessibility issues as Captions are. Subtitle tracks are used primarily to convey the dialogue in a language other than the one being spoken in the video. Take, for example a Japanese movie where the subtitles translate the content to English.
Subtitles are not expected to convey additional non-verbal cues. Once again, subtitles are only meant to provide a translation of the words being spoken although some delivery formats such as Blue Ray do not follow this recommendation.
**What's the difference between captions and subtitles?**
> The main difference is that subtitles usually only transcribe the spoken dialog, and are mainly aimed at people who are not hearing impaired, but lack fluency in the spoken language. Closed captions are aimed at the deaf and hearing impaired, who need additional non-verbal audio cues (such as "\[GUN SHOT\]" or "\[SPOOKY MUSIC\]") to be transcribed in the text. Closed captions are also useful for situations in which video is being shown but the sound is muted or difficult to hear, such as for a noisy bar, convention floor, video signage & billboards, etc. From [http://www.cpcweb.com/faq/](http://www.cpcweb.com/faq/)
Other than the content for each type of track, HTML5 video structures the track element the same way. In the example below, the only difference are the `kind` attributes for each track.
```