Revisiting Video Encoding: MP4 and WebM

video in the modern world


When HTML first introduced the video tag I was pumping my fist in joy. No more plugins to play video content. It was as simple as creating marlup like the one below to play and video in an MP4/ACC container with Egnlish and Swedish subtitles that can be changed as needed.

<!-- Video with subtitles -->
<video src="foo.mp4" poster="foo-poster.png" 
       width="640" height="480" controls>
  <track kind="subtitles" src="foo.en.vtt" srclang="en" label="English">
  <track kind="subtitles" src="" srclang="sv" label="Svenska">

But it wass never as simple as it looked. Because there was no standard video format for HTML5 video, different browsers supported different container formats and different audio and video codecs. So the video turned into something like this:

<video height="480" width="640" controls
  poster="" >
  Your browser doesn't support HTML5 video tag.

which produces the following video player:

Each source element loads a different version of the video encoded with a different set of audio and video codecs. These files must be encoded separately and hosted separately.

There are also patent issues around MP4/h264 and ACC codecs. The MPEG Licensing Authority create a “patent pool” of essential technologies for MP4 encoding and decoding.

I had hoped that the new HEVC/h265 technology would not be encumbered by MPEG-LA style patent trolls but it was too much, apparently, as MPEG-LA already has an HVEC patent pool

So the fight has remained a stalemate with Mozilla and Opera on one side who refuse to pay the MP4 licensing fee and Microsoft, Google and Apple who have caved in and support MPEG4 playback as part of their HTML5 video implementations.

So, if it’s not MPEG4 or HVEC/h265 then what alternatives do we have available?

While Google implements MPEG4 in Chrome it has not remained static in the video codec front. In 2009 Google purchased On2 Technologies and have worked hard to make VP8, VP9 and its successor, WebM

MPEG-LA must have seen the benefit of VP8 becausse they began forming a patent pool for the technology. Google didn’t like that and the conflict ended with an agreement that would remove the MPEG-LA as a factor in VP8 licensing so that Google can continue to offer the code free and unencumbered for personal and commercial use, for now.

Why is this important?

[B]ecause this means that VP8 is a hell of lot safer and more free from possible legal repercussions than H.264 itself. What many H.264 proponents do not understand, either wilfully or out of sheer ignorance, is that those H.264 licenses embedded in Windows, OS X, iOS, your ‘professional’ camera, and so on, do not cover commercial use. If you shoot a video with your camera in H.264, upload it to YouTube, and get some income from advertisements, you’re in violation of the H.264 license (and the MPEG-LA made it clear they had no qualms about going after individual users). The extension the MPEG-LA announced (under pressure from VP8 and WebM) changed nothing about that serious legal limitation.

Google called the MPEG-LA’s bluff, and won

Why Our Civilization’s Video Art and Culture is Threatened by the MPEG-LA

The other codec worth looking at (mostly because it’s supported by Firefox) is OGG Theora from the Foundation. Like VP8 and WebM Theora is free and unencumbered by patents.

MP4 containers can be optimized for a kind of pseudo streaming by re-arranging the “atoms” of the movie (atoms, in this context, are the chunks of data that make up the movie). The video player is looking or the moov atom and will not play the movie until it finds it.

If your server is configured for HTTP Range Requests it will request smaller chunks until it find the atom it needs.

Different requests for the same video

Unfortunately for on-demand movies the moov atom is at the end of the file. So if the server is not configured to handle range requests then the player will have to download the complete file before it can start playing it.

If you’re using already made content or don’t want to re-encode the video you can use tools like Handbrake to optimize the video file for streaming across the web by moving the moov atom to the beginning of the file.

Using Handbrake to re-arrange the video atoms

If you’re working with multiple files or are more comfortable you can use ffmpeg to encode the file or add the appropriate flag to fast start playback. In the example below we add the faststart flag and use the same audio and video codecs as t he original file.

ffmpeg -i input.mp4 -movflags faststart \
-acodec copy \
-vcodec copy \

You can do something similar with WebM videos. The format is based around the Matroska container, either VP8 or VP9 video codecs and either Opus or Vorbis audio codecs. Matroska files, usually just called MKV files, use a kind of binary XML called EBML to store different things like video tracks, audio tracks, subtitles, and other data. These data chunks are called elements and they are similar in concept to the atoms in an MP4 file.

As with all video formats to start playing a WebM video, a browser has to know where the audio and video data is stored in elements. The element we’re looking for is SeekHead. By default most video creation tools put a SeekHead element at the start of the video. The problem is that each video can have an unlimited number of SeakHead. In this case, the first SeekHead will container a pointer to a second SeekHead located at the end of the file.

Even if the first SeekHead contains pointers to the video and audio tracks, the browser still must go fetch the second SeekHead element, to see if there are additional video or audio tracks in the file, and determine which one has preference. Even if the second SeekHead is completely empty the browser must download and parse all SeekHead elements in the WebM file before it can play video content.

When playing a WebM video locally we don’t need to worry about the file structure since we have all the content available for playback. When streaming a video over HTTP the order of elements does matter because the browser doesn’t have the complete file yet. If the browser doesn’t get certain elements at the beginning of the file it has to send range HTTP requests until it finds the data it needs. This can have impact on how quickly the file starts playing and overall page performance. The discussion below is all about rearranging the elements in the container.

Another aspect of WebM streaming performance is to optimize for seeking inside a video. This is another element, Cues. For the same reasons we are optimizing for fast start we want the Cues element downloaded as early as possible so that, if a user fast forwards the video, they will get a few HTTP downloads as possible.

To accomplish both goals, fast playback and fast seeking we’ll use a single tool, mkclean, a tool specifically designed to address both the fast start and the fast seek problems. Using original.webm we run the following command to create the resulting optimized.web ready for the web

mkclean --doctype 4 \
--keep-cues \
--optimize \
original.webm optimized.webm