HTML5 video captioning using VTT

Introducing VTT

WebVTT (Web Video Text Tracks), formerly known as WebSRT, is a W3C community proposal for synchronized video caption playback. It is a time-indexed file format and it is referenced by HTML5 video and audio elements.

As with many assistive technologies, it would be a mistake to assume that they are only meant as a way to provide for accessibility accomodations. We can enable captions when the ambient noise is too loud to listen to a recorded presentation, we can use chapters to navigate through a long lecture video just like DVD or Blue Ray movies.

Captions can also improve our movies’ discoverability. Google indexes the content of our captions. Both YouTube and Google search can report results based on the video captions available for a given file.

WebVTT files provide open captions, independent of the audio or video files they are attached to, they are not “hard coded” into pixels. This also means that creating VTT files requires nothing more than a text editor; although there are more specialized tools to create the captions.

Browser support

Based on Silvia Pfeiffer’s post to the VTT community group dated August, 2012, and updated with new information about Firefox, the following browsers support VTT tracks for video and audio:

Browser Version First Supported Format Supported Notes
Internet Explorer IE 10 Developer Preview 4 VTT and TTML
Google Chrome Version 18 VTT
  • Basic tutorial hosted at HTML5 Rocks
  • Based on Webkit’s implementation
Apple Safari Version 6 VTT
  • Based on Webkit’s implementation
Opera Since August, 2012 VTT
Firefox Nightly VTT
  • Tested with version 29.0a1 (12/14/2013)
  • Feature enabled by default
  • See the Mozilla Developer Documentation for more information
  • If the size of the video doesn’t match the size attributes of the video tag, the video will display on white/gray background

Polyfills and alternatives

I will use one of the many polyfils available for HTML5 Video Tracks. Playr seems to be the most feature complete polyfill for HTML5 video tracks. The downside is 2 more files (one CSS and one JavaScript) to download for the video page but until VTT is widely supported the extra files are worth the effort to create accessible content.

One way to ensure that we only load our polyfill if the browser doesn’t support tracks natively is to use Modernizr.load to conditionally load Playr’s CSS and JavaScript when the browser does not support HTML5 video tag natively.

Modernizr.load([
 {
    // test whether we support video
    test : Modernizr.video,
    // Load the corresponding assetts for the polyfill you want to use
    // in this case we are using the playr polyfill
    nope : ['playr.js', 'playr.css']
  },
])

The code below uses plain JavaScript to test if a browser supports HTML5 video by creating an empty video element and testing for the video’s canPlayType property. It will not load the code for a polyfill like the Modernizr example.

var canPlay = false;
  var h, plink, pscript;

  // Create an empty video element
  var v = document.createElement('video');
  // If the video can playType and can play MP4 video
  if(v.canPlayType) {
    // Set canPlay to true
    canPlay = true;
    // Display an alert telling them so
    alert('Can Play HTML5 video')
  }
  else {
    // Append Playr CSS and JS to the head of the page to
    // provide a fallback
    h = document.getElementsByTagName('head')[0];
    plink = document.createElement('link');
    plink.setAttribute('href', 'css/playr.css');
    plink.setAttribute('media', 'screen');
    h.appendChild(plink);
    pscript = document.createElement('script');
    pscript.setAttribute('src', 'js/playr.js');
    h.append('pscript');
  }

This is the simplest test for video support; a more elaborate version can include support for specific formats and write the <source> tags only for the supported formats. The example below makes the following assumptions:

  • You have encoded a video in all three formats (webm, mp4 and ogg)
  • You are testing for support for HTML5 video in general and specific formats
  • If HTML5 video is not supported you have a flash-based fallback solution
var canPlay = false;
  // Get the video by selecting the video tag
  var v = document.getElementsByTagName('video');
  // Optionally add video attribtues as needed
  // At a minimum set height, width and controls
  // as shown below
  v.setAttribute('height', '640');
  v.setAttribute('width', '480');
  v.setAttribute('control', 'control');

  // If the video can playType and can play MP4 video
  if (v.canPlayType && v.canPlayType('video/webm'; codecs="vp8, vorbis"').replace(/no/, '')){
    // append the appropriate source track
    var webm = v.appendChild(source);
    webm.setAttribute("source", "myvideo.webm");
    webm.setAttribute("type", "video/webm");
  }
  else if (v.canPlayType && v.canPlayType('video/mp4; codecs="avc1.42E01E, mp4a.40.2"').replace(/no/, '')){
    // append the appropriate source track
    var mp4 = v.appendChild(source);
    mp4.setAttribute("source", "myvideo.mp4");
    mp4.setAttribute("type", "'video/mp4; codecs="avc1.42E01E, mp4a.40.2"'");
  }

Also note that we’re testing for specific audio and video codec combinations. WebM supports a single combination of video and audio codecs but MP4 supports multiple profiles, not all of which are supported in HTML5 video. See http://mpeg.chiariglione.org/faq/what-are-different-profiles-supported-mpeg-4-video for an introduction to the different profiles supported by MPEG4.

Players and Polyfills

Playr is by no means the only polyfil or the only player that supports VTT. It is the one that I found the most feature complete for what I needed. The selection below represents a set of players and polyfills available.

Different types of VTT tracks and their structures

Captioning Tracks

Captioning is text that appears on a video, which contains dialogue and audio cues such as music or sound effects that occur off-screen. The purpose of captioning is to make video content accessible to those who are deaf or hard of hearing, and for other situations in which the audio cannot be heard due to noise or a need for silence.

Captions can be either open (always visible, aka “burned in”) or closed, but closed is more common because it lets each viewer decide whether they want the captions to be turned on or off.

From http://www.cpcweb.com/faq/

The simplest and most often used type of text track, captions provide alternative text content for people with visual dissabilities, for people who choose to play the video without audio, and others.

Depending on the player you may have open captions, where the captions are always visible on screen, or closed captions where you have to manually activate the display of captions; Either open or closed, the captions are independent of the content they are attached to.

WEBVTT (1)

railroad (2)
00:00:10.000 --> 00:00:12.500 (3) [Optional Settings] (4)
Left uninspired by the crust of railroad earth (5)

manuscript
00:00:13.200 --> 00:00:16.900
that touched the lead to the pages of your manuscript.

Explanation of the cue above:

  1. WEBVTT must be the first item on the file, on the first line and in a line of its own. Optionally there may be lines of metadata. This section must be followed by a blank line
  2. The name of the cue. This is also optional
  3. Immediately below the name of the cue come the beginning and end time for the cue expressed in hours:minutes:seconds:milliseconds format. Hours, Minutes and Seconds must have 2 digits and be padded with zeros if necessary. Miliseconds must have 3 digits and be zero padded if not long enough
  4. Optional Cue Settings separated from the time one or more SPACE or TAB characters
  5. The text for the cue

Subtitles Tracks

Subtitle Tracks are similar to Caption Tracks but are not meant to address accessibility issues as Captions are. Subtitle tracks are used primarily to convey the dialogue in a language other than the one being spoken in the video. Take, for example a Japanese movie where the subtitles translate the content to English.

Subtitles are not expected to convey additional non-verbal cues. Once again, subtitles are only meant to provide a translation of the words being spoken although some delivery formats such as Blue Ray do not follow this recommendation.

What’s the difference between captions and subtitles?

The main difference is that subtitles usually only transcribe the spoken dialog, and are mainly aimed at people who are not hearing impaired, but lack fluency in the spoken language. Closed captions are aimed at the deaf and hearing impaired, who need additional non-verbal audio cues (such as “[GUN SHOT]” or “[SPOOKY MUSIC]”) to be transcribed in the text. Closed captions are also useful for situations in which video is being shown but the sound is muted or difficult to hear, such as for a noisy bar, convention floor, video signage & billboards, etc.

From http://www.cpcweb.com/faq/

Other than the content for each type of track, HTML5 video structures the track element the same way. In the example below, the only difference are the kind attributes for each track.

<!-- This is the captions track -->
<track kind="captions" lang="en" srclang="en" label="English" src="sintel.vtt" />
<!-- This is the subtitles track for Spanish -->
<track kind="subtitles" lang="es" srclang="es" label="Español" src="sintel-es.vtt" />

Chapter Tracks

Chapter tracks help you navigate through the video by associating certain “chapters” with time codes. This will let you navigate to different sections of your video using some sort of visual cue.

In the example below, chapter 1 is 10 second long and titled Introduction to HTML5.

WEBVTT (1)

Chapter 1 (2)
00:00:01.000 --> 00:00:10.000 (3)
Introduction to HTML5(4)

Chapter 2
00:00:10.001 --> 00:00:15.000
Introduction to HTML5

Explanation of the chapter above:

  1. WEBVTT must be the first item on the file, on the first line and in a line of its own. It must be followed by a blank line
  2. The name of the chapter
  3. Immediately below the name of the chapter come the beginning and end time expressed in hours:minutes:seconds:milliseconds format. Hours, Minutes and Seconds must have 2 digits and be padded with zeros if necessary. Miliseconds must have 3 digits and be zero padded if not long enough
  4. The title of the chapter

Description Tracks

Description tracks are used primarily as an assistive technology helper, these tracks will be read by assistive technology devices for people with visual disabilities (blind or low vission). The cues can be arbitrarily long as long as they don’t contain empty lines (they would signal the beginning of a new cue)

VTT - Description for Sintel trailer

Sintel's Search -- beginning of the search
00:00:01.000 --> 00:00:52.000
Woman walks up a mountain
Fights an unknown man
Smoking man (covering full frame) speaks
Little dragon flies towards the woman before a larger dragon snatches it and flies away. The woman screams trying to grab the smaller flying creature

Metadata Tracks

Metadata Tracks are used to convey any additional information (such as base64 encoded images, JSON, additional text or any additional text-based file format) the developer needs to include in the page based on time indexes. A web app can listen for cue events, extract the text of each cue as it fires, parse the data and then use the results to make DOM changes (or perform other JavaScript or CSS tasks) synchronised with media playback.

WEBVTT - Example metadata track containing JSON payload

multiCell
00:01:15.200 --> 00:02:18.800
{
"title": "Multi-celled organisms",
"description": "Multi-celled organisms have different types of cells that perform specialised functions.
  Most life that can be seen with the naked eye is multi-cellular. These organisms are though to have evolved around 1 billion years ago with plants, animals and fungi having independent evolutionary paths.",
"src": "multiCell.jpg",
"href": "http://en.wikipedia.org/wiki/Multicellular"
}

insects
00:02:18.800 --> 00:03:01.600
{
"title": "Insects",
"description": "Insects are the most diverse group of animals on the planet with estimates for the total
  number of current species range from two million to 50 million. The first insects appeared around
  400 million years ago, identifiable by a hard exoskeleton, three-part body, six legs, compound eyes
  and antennae.",
"src": "insects.jpg",
"href": "http://en.wikipedia.org/wiki/Insects"
}

We can then use Javascript to parse the track content and do something with the track’s content.

textTrack.oncuechange = function (){
  // "this" is a textTrack
  var cue = this.activeCues[0]; // assuming there is only one active cue
  var obj = JSON.parse(cue.text);
  // do something
}

Building the tracks

We can build our caption file using the text above as an example, and this is the most common way to caption a video for accessibility.

We can also build multiple caption tracks as well as a variety of other tracks. Most polyfills will support a subset of the full VTT specification, Playr, the polyfill I’ve selected for these examples, supports captions, descriptions and chapter tracks.

Getting the captions to work

Building the tracks

(built with information from http://demosthenes.info/blog/584/Creating-And-Validating-WebVTT-Subtitles)

There are no programs that support VTT as a native captioning format. However there are plenty of programs that will create SRT captions, which is very similar to VTT (we’ll discuss the differences later in this section).

Choose whatever tool will work best for you to generate the SRT file; then follow the instructions below to convert them to VTT files.

Converting SRT to VTT

Due to their close relationship, conversion from .srt into .vtt is very simple. A typical .srt file will look something like this:

1
00:01:21,700 --> 00:01:24,675
Life on the road is something
I was raised to embrace.

The process is little more than a find-and-replace:

  • Add WEBVTT to the first line of the file
  • Convert the comma before the millisecond mark in every timestamp to a decimal point
  • Add styling markup to the subtitle text if needed
    • Special characters must be escaped as in HTML (&amp;, &gt;, &lt;)
    • You can use CSS classes defined in your CSS file by using &gt;c.XXX&lt;
    • See the section Cue Payload Tags for more information about the specific tags you can use to style your content

The resulting VTT file will look like this:

WEBVTT

Life
01:21.700 --> 01:24.675
Life on the road is something
I was <i>raised</i> to embrace.

Save the file with a .vtt extension and link to it from a <track> element in your video.

Validating A VTT File

It is not hard to make mistakes when creating a VTT track fille. Fortunately there is an online validator to help with authoring.

VTT Validator

It is essentially a two step process:

  • Paste the text of your VTT file
  • Select the type of track you’re working on

The results will display automatically.

Optional Cue Settings

Cues can also be styled and moved around the screen relative to the borders of the video. The table below summarizes the settings avalable for cues.

Vertical Alignment

Name: vertical

Values: rl (right to left) – lr (left to right)

What is used for: Vertical text alignment for languages like Japanese that can be read from top to bottom

Example: vertical:lr (makes the cue display vertically from left to right)

Line Placement / Top Alignment

Name: line

Value [-][0 or larger] (negative or possitive number) or [0-100]%

What is used for: Absolute references to a particular line number the cue is to be displayed on.
What is used for: Percentage value indicating the position relative to the top of the frame (when using percentages)

  • Line numbers are based on the size of the first line of the cue.
  • A negative number counts from the bottom of the frame

  • Positive numbers from the top

Cue Box Size

Name: size

Value: [0-100]%

What it’s used for: Indicates the size of the cue box. The value is given as a percentage of the width of the frame

Text Align

Name: align

Values: start | middle | end

What it’s used for: Specifies the alignment of the text within the cue. The keywords are relative to the text direction and are the same alignment keywords used in SVG

The alignment values are similar to those used in SVG. For users of CSS that uses a different terminology, the equivalency is:

  • Start alignment: The cue box’s left side (for horizontal cues) or top side (otherwise) is aligned at the text position.
  • Middle alignment: The cue box is centered at the text position.
  • End alignment: The cue box’s right side (for horizontal cues) or bottom side (otherwise) is aligned at the text position.

Note: if no cue settings are set, the positioning default to the middle, at the bottom of the frame.

Cue positioning

Name: position

Value [0-100]%

What is used for:
Percentage value indicating the horizontal alignment relative to the edge of the frame where the text begins (e.g. the left edge in English)
The value is dependent on the alignment of the cue:
* For left aligned or start aligned cues: 0%.
* For middle aligned cues: 50%.
* For right aligned or end aligned cues: 100%.

Note: Since the default value of the text track cue text alignment is middle, if there is no text track cue text alignment setting for a cue, the text track cue text position defaults to 50%.

Note: Even for horizontal cues with right-to-left paragraph direction text, the cue box is positioned from the left edge of the video frame. This allows defining a rendering space template which can be filled with either left-to-right or right-to-left paragraph direction text. If you define such a cue box template with start or end aligned text, make sure to control its size unless you want text to flip from one side of the video frame to the other.

Cue Payload Tags

These are additional tracks that will allow you to customize the appearance of your tracks. ** You cannot use payload tags with chapter tracks**

Timestamp Tags (Karaoke Style and Paint On Caption Text)

Using timestamp tags can build Karaoke Style tracks. You build the track by inserting the correct time stamp where you want the text to change, subject to the following restrictions:

  • The timestamp must be greater that the cue’s start timestamp, greater than any previous timestamp in the cue payload, and less than the cue’s end timestamp.
VTT - Example Karaoke Style Track

1
00:16.500 --> 00:18.500
When the moon <00:17.500>hits your eye

2
00:00:18.500 --> 00:00:20.500
Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie

3
00:00:20.500 --> 00:00:21.500
That's <00:00:21.000>amore

In the example above:

  • The active text is the text between the timestamp and the next timestamp or to the end of the payload if there is not another timestamp in the payload.
  • Any text before the active text in the payload is previous text .
  • Any text beyond the active text is future text. We can use the previous and future tracks to create the Kraoke experience.

A possible CSS rule to style the content looks like this.


::cue:past {
  color:yellow
}

::cue:future {
  text-shadow: black 0 0 1px;
}

Timestamp tags can also be used for Paint On captions, which placed independently from each other and don’t erase what was already on the screen. They are written one letter at a time and they appear to ‘paint on’ the screen.

Speaker Semantics

You can use a combination of cue positioning and specific markup on individual cues to further emphazise who is speaking in a given caption or subtitle where appropriate.

WEBVTT - Sintel Caption File With Speaker Semantics

Sage
00:00:12.000 --> 00:00:15.000 A:middle T:10%
<v.gatekeeper>What brings you to the land
of the gatekeepers?

Searching
00:00:18.500 --> 00:00:20.500 A:middle T:80%
<v.sintel>I'm searching for someone.

We can style the speaker semantic classes using CSS. For example we can add a different color for each speaker, something like the example below:

video::cue(v.gatekeeper) {
  color:lime;
}

video::cue(v.sintel) {
  color: #ff00ff;
}
Addtional Style tags

The following tags require opening and closing tags.

Class tag

Style the contained text using a CSS class.

Cue 14 - Class tag example
<c.classname>text</c>

Italics tag

Italicize the contained text.

Example 15 - Italics tag
<i>text</i>

Bold tag

Bold the contained text.

Example 16 - Bold tag
<b>text</b>

Underline tag

Underline the contained text.

Example 17 - Underline tag
<u>text</u>

Ruby tag / Ruby text tag

Used together to display ruby characters (i.e. small annotative characters above other characters). Ruby annotations are primarily used in languages with logographic alphabets (Japanese, Chinese, Korean) where a single character may represent a complete word and where the meaning of the character may not be familiar to the reader.

Ruby characters are small, annotative glosses that can be placed above or to the right of a Chinese character when writing languages with logographic characters such as Chinese or Japanese to show the pronunciation. Typically called just ruby or rubi, such annotations are used as pronunciation guides for characters that are likely to be unfamiliar to the reader.

From Wikipedia

Example 18 - Ruby tag and Ruby text tag
<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>

Adding the tracks to the video

Either in a supported browser or using one of the polyfills available (like we’ve chosen to do with Playr) we add <track> elements, one for each language of captions that we make available.

There is one non-standard attribute we will add to the video to make it work with Playr. The code below shows what a video track looks with associated an associated caption track for English.

<!DOCTYPE html>
<html>
<head>
  <title>Sample Captioned Video</title>
  <script src="playr.js"></script>
  <link rel="stylesheet" href="playr.js"></head>
</head>
<body>
<video
  id="myvideo"
  controls="controls"
  class="playr_video"
  width="640" height="480"
  poster="http://media.w3.org/2010/05/sintel/poster.png"
>
<!--
  These are the three sources. This should cover most of our
  deployed player base
-->
<source src="//media.w3.org/2010/05/sintel/trailer.mp4" type="video/mp4" />
<source src="//media.w3.org/2010/05/sintel/trailer.webm" type="video/webm" />
<source src="//media.w3.org/2010/05/sintel/trailer.ogv" type="video/ogg" />
<!--
  This is the captions track
-->
<track kind="captions" lang="en" srclang="en" label="English" src="sintel.vtt" />
</video>
</body>
</html>

The working example is located at http://labs.rivendellweb.net/vtt-demo/basic.html and an example without a polyfill (meant to test native browser support) is located at http://labs.rivendellweb.net/vtt-demo/basic-plain.html

The same example without polyfill support and supporting captions in English and Spanish with the English caption being the default. The default attribute will also display the captions automatically

<!DOCTYPE html>
<html>
<head>
  <title>Sample Captioned Video</title>
</head>
</head>
<body>
<video
  id="myvideo"
  controls="controls"
  width="640" height="480"
  poster="http://media.w3.org/2010/05/sintel/poster.png"
>
<!--
  These are the three sources. This should cover most of our
  deployed player base
-->
<source src="//media.w3.org/2010/05/sintel/trailer.mp4" type="video/mp4" />
<source src="//media.w3.org/2010/05/sintel/trailer.webm" type="video/webm" />
<source src="//media.w3.org/2010/05/sintel/trailer.ogv" type="video/ogg" />
<!--
  This is the captions track
-->
<track kind="captions" lang="en" srclang="en" label="English" src="sintel-en.vtt" default />
<track kind="captions" lang="es" srclang="es" label="Spanish" src="sintel-es.vtt" />
</video>
</body>
</html>

The final example contains multiple caption tracks, subtitles in Spanish and descriptions for the video.

<!DOCTYPE html>
<html>
<head>
  <title>Sample Captioned Video</title>
</head>
</head>
<body>
<video
  id="myvideo"
  controls="controls"
  width="640" height="480"
  poster="http://media.w3.org/2010/05/sintel/poster.png"
>
<!--
  These are the three sources. This should cover most of our
  deployed player base
-->
<source src="//media.w3.org/2010/05/sintel/trailer.mp4" type="video/mp4" />
<source src="//media.w3.org/2010/05/sintel/trailer.webm" type="video/webm" />
<source src="//media.w3.org/2010/05/sintel/trailer.ogv" type="video/ogg" />
<!--
  These are the captions track
-->
<track kind="captions" lang="en" srclang="en" label="English" src="sintel-en.vtt" default />
<track kind="captions" lang="es" srclang="es" label="Spanish" src="sintel-es.vtt" />
<track kind="captions" lang="de" srclang="de" label="Spanish" src="sintel-de.vtt" /></video>
<!--
  This is the subtitles track
-->
<track kind="subtitles" lang="es" srclang="es" label="Subtitulos en Español" src="sintel-es-subtitles.vtt" />
<!--
  These are the description tracks
-->
<track kind="captions" lang="en" srclang="en" label="English" src="sintel-en.vtt" default />
</body>
</html>

Text tracks and audio

Text tracks are not limited to working with just video. They work just the same with audio. The example below (taken from http://mattcrouch.net/experiments/music-sync/) provides synchronized captions to an audio track.

Using jQuery, an extract of the audio for the Sintel video and the same captions that we used for the video examples, we change the cues programatically using the video API to display the cues at the matching time.

As you can see, description tracks would be particularly useful in this case as they would provide a more complete context to the audio.

jQuery(document).ready(function() {
    // Step below is optional. I don't like taking
    // the option from the user and autoplay the video
    $('audio').trigger("play");
    var audio = document.querySelector("audio");
    // log the name of the track we're working with
    console.log(audio.textTracks[0]);

    audio.textTracks[0].oncuechange = function (){
      $("#output").html(""); // Clear the content of our output region
        if(this.activeCues !== null) {
          for(var i=0;i");
            }
          }
        }
      }
    });

Additional Tutorials And Tools

Modernizr and Prefix Free

As a designer we all hit the point where we want to use a specific feature of HTML5 and you don’t know what browsers support what version of the specification in question. This is particulary important when we’re asked to support older browsers where the same script we use to trigger a feature may be ignored or have unexpected results.

In the CSS front we have to deal with “prefix hell”. In their efforts to be in the bleeding edge of CSS feature development, and to show how the features may work in standard implementations, browser vendors (all of them) have released features under prefix. While cutin dge development is important it is not so important to drive developers crazy trying to remember whether a given browser needs prefix for a given property (sometimes different versions of the same browser will support different syntaxes for the same property) and if we don’t code our CSS defensively, we may fnd that as vendors fully support properties, our code may stop working.

That’s where Modernizr and Prefix Free come in.

What are these tools?

These tools are here to make your life and your code easier to manage. They provide features that will, in the end, future proof your

Modernizr is a set of three complementary utilities:

  • A set of CSS classes added to your page’s html tag which will let you style your content based on whether a feature is supported by a given browser.
  • A set of JavaScript boolean values for each tested property that allow you to branch your code deppending on whether the property is supported in the browser you’re using.
  • A conditional resource loader that loads resources based on results from tests for the features that you need for your page or app.

Prefix Free is a JavaScript library that eliminates the hassle of remembering which vendor supports which CSS property or syntax. It does this by adding the needed prefixes to a CSS stylesheet automatically (using JavaScript behind the scenes). This will also help with having to deal with inconsistencies between browser (I hope) and in future proofing your scripts for when all vendors drop prefixes.

Why should we use these tools?

From my perspective the main reason to use these tools is to make your workflow more manageable. It makes sense to code JavaScript only if the feature you’re testing for is present and to provide a workaround if it is not.

There is also the ability to future proof your JavaScript and CSS. We don’t need to do browser detection on our code but rather test and detect the features we are interested in working on. We can style our CSS according to whether a browser supports a given feature rather than do a one-size-fits-all layout… as browser support increases we will be able to eventually eliminate the non supporting CSS code.

There is also the freedom of using a single form of the prefixed content. Prefixfree will intercept the stylesheet and only add prefixes where they are needed.

Examples

Modernizr

In CSS we can use Modernize to code for both the browsers that support the features we are trying to work with and for those who don’t. Let’s take, for example, CSS columns, how to handle them for browsers that support them and how to provide a “graceful degradation” solution for browsers that can’t.

.csscolumns {
  column-width: 15em;
  column-gap: 2em;          
  column-rule: 4px solid green;
  padding: 5px;
  column-fill: balance;
  height: 400px;
}

.no-csscolumns {
  width: 60%;
  border: 1px solid #000;
}

What this will do is for the browsers who support CSS columms (indicated by the .csscolumns class) and provide different styles for those browsers that don’t support the feature we are working with (in this case indicated by the .no-csscolumns class)

Modernizr also creates a javascript object, modernizr, and populates with booleans that reflect the features the browser supports or doesn’t support. We can test these features with if statements (which we do in the JavaScript example below to test for Canvas support).

<script>
  if (Modernizr.canvas) {
    alert("This browser supports HTML5 canvas!");
    // The rest of the code that relies on canvas
    // goes here
  }
  else {
    // Polyfill or code for older browsers goes here
  }
</script>

When using JavaScript we can also chaing tests to make sure the browser supports all the features that we need without having to make more than one modernizr.load calls.

<script>
  // We are interested in canvas and WebGL 
  if (Modernizr.canvas) && (Modernizr.webgl) {
    alert('Your browser supports canvas and WebGL')
    // The code that relies on either feature goes here
  }
  else {
    // Polyfill or code for older browsers goes here
  }
</script>

The final example uses modernizr.load, also known as yepnope.js to conditionally load resources (JavaScript polyfills and associated resources) based on features supported by a given browser. In its simplest form, we use the loader to tell the browser what resources to load:

// We call Modernizr.load
  Modernizr.load({
  // and ask it to test if geolocation is supported
  test: Modernizr.geolocation,
  // if it is load the geo.js file (full of geolocation goodness)
  yep: 'geo.js',
  // otherwise tell it to load the polyfill (not so good but 
  // better than  nothing)
  nope: 'geo-polyfill.js'
});

In a more complext example we can tell modernizr to test for multiple CSS3 and HTML5 properties, load resources conditionally based on the feature tests and the load resources regardless.

// Call Modernizr.load
  Modernizr.load({
    // Ask it for the features that we need on the script
    test: Modenizr.canvas && Modernizr.webgl,
    // These are the scripts we load if the features are supported
    yep: [main.js, support.js, style.css],
    // These are the polyfills and scripts we use when features
    // are not supported.
    nope: [webgl-polyfill.js, canvas-polyfill.js],
    // These resources will be loaded regardless. 
    // This method is also aliased as load: 
    both: analytics.js
  })

It may not be the fastest way to load your resources but it is the most flexible and the most user friendly I’ve found so far. Require.js is definitely more powerful but it imposes requirements on the way you structure your applications/sites.

One last thing. If you’ve used HTML5 Boilerplate you’ve probably seen code like this where we load jQuery from a CDN and immediately test whether jQuery is available and if it’s not then we load a local copy:

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.js"></script>
<script>window.jQuery || 
document.write('<script src="js/libs/jquery-1.7.1.min.js">\x3C/script>')</script>

You can use Modernizr to duplicate the functionality like so:

Modernizr.load([
  {
    // We first load jQuery using CDN
    load: '//ajax.googleapis.com/ajax/libs/jquery/1/jquery.js',
    complete: function(){
      //Once that completes we test if jQuery is available
      if ( !window.jQuery){
        // If it's not available then we load jQuery locally
        Modenizr.load('path/to/local/jquery.min.js')
      }
    }
  },
  {
    // This will wait for jQuery, either remote or 
    // local fallback, to load before executing

    load: 'My-local-jquery-script.js'
  }
]);

Prefix Free

Now that we know what CSS we’ll be loading on to our pages, let’s make it lighter, shall we?

We use prefix free by loading the following JavaScript in the <head> of our page, before any CSS is actually loaded. The head of our document looks like this:

<script src='//cdnjs.cloudflare.com/ajax/libs/prefixfree/1.0.7/prefixfree.min.js'><script>
  <link rel="stylesheet" href="styles.css">

and in our CSS rules, instead of doing:

div .background {  
  background: -webkit-gradient(linear, left top, right top, from(#2F2727), color-stop(0.05, #1a82f7), color-stop(0.5, #2F2727), color-stop(0.95, #1a82f7), to(#2F2727));
  -webkit-linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
  -moz-linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
  -ms-linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
  -o-linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
  linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
}

we can just do:

div .background {  
  background: linear-gradient(left, #2F2727, #1a82f7 5%, #2F2727, #1a82f7 95%, #2F2727);
}

And Prefixfree will take care of adding the correct prefixes to the code so we don’t have to. Is it perfect? No, of course not.

Lea Verou, Prefix Free’s creator lists several shortcomings on the project website

  • Prefixing code in @import-ed files is not supported
  • Prefixing cross-origin linked stylesheets is not supported, unless they are CORS-enabled (More information on CORS)
  • Unprefixed linked stylesheets won’t work locally in Chrome and Opera.
  • Unprefixed values in inline styles (in the style attribute) won’t work in IE and Firefox < 3.6.
  • Unprefixed properties will not work in Firefox < 3.6.

Also please note that Prefixfree will not overwrite any properties that are already prefixed. If you know you need to work with a given prefixed property you can write it in your style sheet and Prefixfree will ignore the property and associated rule.

Conclusion

Modeernizr and Prefixfree provide a reasonable workflow for front end development. They simplify your scripts (now it’s fine to have multiple scripts because we can load only those we need for a given page in an application) and they make your CSS cleaner and easier to read.

Video, how has it changed and what you can do with it now

Since I last looked at video (it’s been a few years), there are new and amazing things that have happened in the field and I’m just amazed at the kind of stuff that you can do.

This essay is the beginning of my exploration into the updated world of web video. In a separate essay I will explore web video’s cousin, WebRTC and describe there what the differences are.

Video is a first class web citizen

The humble video tag has come a long way since I last looked at it. It still requires a lot of extra work to make it work everywhere. Look at the video below: It should work just as good whether your browser support any of the HTML5 video formats or if you must fall back to the Flash plugin.

Download video: MP4 format | Ogg format | WebM format

Note in the example below how we code defensively and provide 3 different versions of the video to work around the lack of a common format supported by all browsers with fallback only necessary for browsers that do not support the <video> element.

There are people who still claim that a single format (h264) us all we need to work across all browsers. They say that we need to only provide the MP4 video and let all other browsers fall through to the Flash fallback.

This is almost as ridiculous as vendors decisions not to support one format or another because of perceptions of freedom (I have ranted before Mozilla’s position on H264) but the opposite decision to use only H264 is just as ridiculous. We loose the performance we gain with native video formats and we loose the ability to do some of the things we’ll discuss below.

<!-- "Video For Everybody" http://camendesign.com/code/video_for_everybody -->
<video controls="controls" poster="http://media.w3.org/2010/05/sintel/poster.png" width="640" height="480">
    <source src="http://media.w3.org/2010/05/sintel/trailer.mp4" type="video/mp4" />
    <source src="http://media.w3.org/2010/05/sintel/trailer.webm" type="video/webm" />
    <source
        src="http://media.w3.org/2010/05/sintel/trailer.ogv" type="video/ogg" />
    <object type="application/x-shockwave-flash" data="http://releases.flowplayer.org/swf/flowplayer-3.2.1.swf" width="640" height="480">
        <param name="movie" value="http://releases.flowplayer.org/swf/flowplayer-3.2.1.swf" />
        <param name="allowFullScreen" value="true" />
        <param name="wmode" value="transparent" />
        <param name="flashVars" value="config={'playlist':['http%3A%2F%2Fmedia.w3.org%2F2010%2F05%2Fsintel%2Fposter.png',{'url':'http%3A%2F%2Fmedia.w3.org%2F2010%2F05%2Fsintel%2Ftrailer.mp4','autoPlay':false}]}" />
        <img alt="Sintel" src="http://media.w3.org/2010/05/sintel/poster.png" width="640" height="480" title="No video playback capabilities, please download the video below" />
    </object>
<p>
    <strong>Download video:</strong> <a href="http://media.w3.org/2010/05/sintel/trailer.mp4">MP4 format</a> | <a href="http://media.w3.org/2010/05/sintel/trailer.ogv">Ogg format</a> | <a href="http://media.w3.org/2010/05/sintel/trailer.webm">WebM format</a>
</p>
</video>

See the Pen HTML5 Video by Carlos Araya (@caraya) on CodePen


Also note that the video will remain the same, in the sections below we will not re-add the video but it will be added to the corresponding codepen example.

Video Capationing

Moved to a separate article: HTML5 video captioning using VTT

CSS Video Manipulation

One of the first things we will do to our (captioned) video is to play with it using CSS. Because we no longer have to worry whether the effects are going to work or not, and we don’t have to use JavaScript CSS gives you a lot more flexibility in terms of what you can do with your video.

For this example we will use jQuery and CSS Translations to move the video to the right, rotate the video 180 degrees and skew the image 45 degrees by pressing buttons located below the video.

See the Pen HTML5 Video: CSS Manipulation by Carlos Araya (@caraya) on CodePen

Creating a video player programmatically

Having the controls for the video built in to the video tag itself is only optional. While it works for most video types there are times when it’s better to create your own video controllers, either because you’re bundling he video with a skinned website or you just may want a different user experience

With all the tools we’ve used so far it is perfectly possible to create video interaction buttons to affect the video outside the video itself.

Playing Video in the Canvas

Canvas is another new web technology that allows for painting text, images and other elements into a canvas element. An intriguing option would be to take periodic frames from a video and paint them into the canvas.

I took the code for the pen below from HTML5 Doctor I’m working on translating it to jQuery so I can understand better how it works and whether jQuery can do Canvas manipulation (I’m pretty sure it can but it is most likely operator error)

See the Pen HTML5 Video: Canvas Mirror by Carlos Araya (@caraya) on CodePen


In order to further differentiate the videos I applied a CSS filter to the canvas element to turn the canvas video into black and white. CSS Filters are only supported in Chrome Canary at the time, hopefully that will change soon 🙂