FFMPEG Notes and Tips

Once support for AV1 comes to FFMPEG I will recompile it to include AV1 support. It will, supposedly, provide better compression than HEVC/H265 and not be encumbered by patents to the level of H265 and H264.

For an idea of the licensing nightmare HEVC has created see this article in The Register

There are times when I dearly wish I had a GUI to do some of the work, particularly with feature-rich applications like FFMPEG but we don’t always have the chance or the choice. While there are tools like IFFMPEG they are not as comprehensive as I’d like them to be so we need to learn at least the basic of the command line.

I’ve chosen to install FFMPEG via Homebrew rather than compile it directly via XCode. The command will install FFMPEG with the optional Open H264 support enabled.

brew install ffmpeg --with-openh264

Once it is installed you get the ffmpeg command available. It’ll be the basis for what follows.

Basic use

These are some of the basic commands that I use.

The first command gathers information about the video:

ffmpeg -i tears_of_steel_1080p.mov

And the result will be something like this. Note that the streams descriptions have hard returns added for readability, your result will look different.

  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'tears_of_steel_1080p.mov':
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf53.32.100
  Duration: 00:12:14.17, start: 0.000000, bitrate: 6361 kb/s
    Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p,
    1920x800 [SAR 1:1 DAR 12:5], 6162 kb/s, 24 fps, 24 tbr, 24 tbn, 48 tbc (default)
      handler_name    : DataHandler
      encoder         : libx264
    Stream #0:1(eng): Audio: mp3 (.mp3 / 0x33706D2E), 44100 Hz, stereo, s16p, 191 kb/s (default)
      handler_name    : DataHandler

Resizing the video

For this first example, we’ll resize the video to (WxH) 640×480.

ffmpeg -i Agent_327_Operation_Barbershop.mkv -s 640x480 agent_327_480p.mp4

Changing the video data rate

FFMPEG allows me to change the bit rate for both audio and video separately. In this example, I will only change the video bitrate to 2 megabits without changing the audio bitrate at all.

ffmpeg -i Agent_327_Operation_Barbershop.mkv \
  -b:v 2m \

Changing the audio data rate

The two attributes that we want to work with for audio are -b:a and -ab. There is an older argument -ar that will accomplish the same thing but the recommendation is to use -b:a to be consistent with the video attribute.

The -b:a attribute controls the audio sampling frequency. In this example, I’ve set the value to 48000.

With -ab we get control of the audio bit rate (expressed in bits per second).

ffmpeg -i Agent_327_Operation_Barbershop.mkv \
  -b:a 48000 -ab 120k  \

Changing the aspect ratio of a video

The aspect ratio of an image describes the proportional relationship between its width and its height. It is commonly expressed as two numbers separated by a colon, as in 16:9.

In FFMPEG, this one is an easy one to change; use the -aspect flag and the aspect ratio than you want to use.

ffmpeg -i Agent_327_Operation_Barbershop.mkv \
  -aspect 16:9 \

Changing the frame rate of a video

Frame Rate is the frequency (rate) at which consecutive images called frames appear on a display. Values above 12 are perceived by humans as motion.

To change it use the -r value

ffmpeg -i Agent_327_Operation_Barbershop.mkv \
  -r 25 \

Converting from one format to another

To make sure that support for all the codecs I want FFMPEG with support for libvpx and opus (the default audio format for VP9) using the following command:

brew reinstall ffmpeg --with-openh264 --with-x265 --with-tools --with-libvpx --with-opus

This makes sure that all the codecs are installed and available for the following sections. This is a one-time operation. Once the codecs are installed you don’t need to reinstall them again

Converting to VP9

Let’s say that your client delivers a video in MP4 format and you need to deliver it as a high-quality VP9 video. Based on We’ll look

  • 1-pass average bitrate
  • 2-pass average bitrate
  • Constant quality
  • Constant bitrate

1-pass average bitrate

The simplest way to encode VP9 is the simple variable bitrate (VBR) mode VP9 offers by default. This is also sometimes called “Average Bitrate” or “Target Bitrate”. In this mode, it will simply try to reach the specified bit rate on average, e.g. 2 MBit/s.

ffmpeg -i input.mp4 \
-c:v libvpx-vp9 \
-b:v 2M \

2-pass average bitrate

In order to create more efficient encodes when a particular target bitrate should be reached, you should choose the two-pass encoding.

For two-pass, you need to run ffmpeg twice as shown below; The differences between the two passes are:

  • In pass 1 and 2, use the -pass 1 and -pass 2 options, respectively
  • In pass 1, output to a null file descriptor, not an actual file. (This will generate a logfile that ffmpeg needs for the second pass)
  • In pass 1, you need to specify an output format (with -f) that matches the output format you will use in pass 2
  • In pass 1, specify the audio codec used in pass 2; in many cases, -an in pass 1 will not work
  • In pass 1 specify a fast encode (speed 4) and in pass 2, a slower encode (speed 1). This should speed up the overall encode process
ffmpeg -i input.mp4 \
-c:v libvpx-vp9 -b:v 2M -pass 1 \
-c:a libopus -speed 4 -f webm /dev/null && \
ffmpeg -i input.mp4 \
-c:v libvpx-vp9 -b:v 2M -pass 2 \
-c:a libopus -speed 1 output.webm

Converting to H264 (AVI to H264)

When working with AVC/H264 video, for the most part, we need to work with Constant Rate Factor (CRF).

CRF is the default quality (and rate control) setting for the x264 and x265 encoders. Values range between 0 and 51, where lower values would result in better quality and larger file sizes. Higher values mean more compression, but eventually, the video quality will suffer noticeably degrade.

Pick the CRF values

The range of the CRF scale is 0–51, where 0 is lossless, 23 is the default, and 51 is worst quality possible (see the note below for the difference between 8 and 10-bit encoding and the CRF values).
A subjectively sane CRF range is 17–28 with a default of 23. Consider 17 or 18 to be visually lossless or nearly so; it should look the same or nearly the same as the input but it isn’t technically lossless.

The range is exponential, so increasing the CRF value +6 results in roughly half the bitrate/file size, while -6 leads to roughly twice the bitrate.

Choose the highest CRF value that still provides an acceptable quality. If the output looks good, then try a higher value. If it looks bad, choose a lower value.

The 0–51 CRF quantizer scale mentioned on this section only applies to 8-bit x264. When compiled with 10-bit support, x264’s quantizer scale is 0–63. You can see what you are using by referring to the ffmpeg console output during encoding (yuv420p or similar for 8-bit, and yuv420p10le or similar for 10-bit). 8-bit is more common among distributors.


A preset is a set of predefined options that will provide a certain encoding speed to compression ratio. A slower preset will provide better compression. If you target a certain file size or constant bit rate, you will achieve better quality with a slower preset at the expense of a bigger file size.

Use the slowest preset that you have the patience to wait for. As with many things, you’ll have to test the presets to see which one works best for your project.

The available presets in descending order of speed are:

  • ultrafast
  • superfast
  • veryfast
  • faster
  • fast
  • medium – default preset
  • slow
  • slower
  • veryslow

You can see a list of current presets with -preset help

CRF Example

This command encodes a video with good quality, using the slow preset to achieve better compression:

ffmpeg -i input.avi -c:v libx264 -preset slow -crf 22 -c:a copy output.mkv

Note that in this example the audio stream of the input file is simply not re-encoded, we just copy the stream over to the output.

If you are encoding a set of videos that are similar, apply the same settings to all the videos: this will ensure that they will all have similar quality.


Two-Pass encoding is more complicated but it works better when trying to target a specific file size with frame output quality being a secondary concern. For this example we’ll use the following formula to calculate bitrate:

(200 MiB * 8192 [converts MiB to kBit]) / 600 seconds = ~2730 kBit/s total bitrate
2730 - 128 kBit/s (desired audio bitrate) = 2602 kBit/s video bitrate

You can also forgo the bitrate calculation if you already know what final (average) bitrate you need.

For two-pass, you need to run ffmpeg twice, with almost the same settings, except for:

  • In pass 1 and 2, use the -pass 1 and -pass 2 options, respectively.
  • In pass 1, output to a null file descriptor, not an actual file. (This will generate a logfile that ffmpeg needs for the second pass.)
  • In pass 1, you need to specify an output format (with -f) that matches the output format you will use in pass 2.
  • In pass 1, specify the audio codec used in pass 2; in many cases, -an in pass 1 will not work.
ffmpeg -y -i input -c:v libx264 -b:v 2600k -pass 1 -c:a aac -b:a 128k -f mp4 /dev/null && \
ffmpeg -i input -c:v libx264 -b:v 2600k -pass 2 -c:a aac -b:a 128k output.mp4

faststart for web video

You can add -movflags +faststart as an output option if your videos are going to be viewed in a browser. This will move some information to the beginning of your file and allow the video to begin playing before it is completely downloaded by the viewer. It is not required if you are going to use a video service such as YouTube.

Compatibility: Profiles and Levels

H264 has multiple profiles. Each profile uses a subset of the coding tools defined by the H.264 standard. The tools are algorithms or processes used for video coding and decoding. An encoder will compress video based on a specific profile, and this will define which tools the decoder must use in order to decompress the video. A decoder may support some profiles, while it does not support others. Each profile is intended to be useful to a class of applications.

If you want your videos to have the highest compatibility with ancient devices (e.g., old Android and iOS phones) use a level 3 baseline profile like so:

-profile:v baseline -level 3.0

This disables some advanced features but provides for better compatibility as encoders and decoders must support the baseline profile. This setting may increase the bit rate compared to what is needed to achieve the same quality in higher profiles.

For iOS devices look at the table below for the combination of profile and level you need to support your target devices. Again, the lower the level/profile combination the wider support you get but there may be times (especially when doing adaptive streaming with HSL or DASH) that you’ll want to go for the higher profile/level combinations.

iOS Compatability (source)
Profile Level Devices Options
Baseline 3.0 All devices -profile:v baseline -level 3.0
Baseline 3.1 iPhone 3G and later, iPod touch 2nd generation and later -profile:v baseline -level 3.1
Main 3.1 iPad (all versions), Apple TV 2 and later, iPhone 4 and later -profile:v main -level 3.1
Main 4.0 Apple TV 3 and later, iPad 2 and later, iPhone 4s and later -profile:v main -level 4.0
High 4.0 Apple TV 3 and later, iPad 2 and later, iPhone 4s and later -profile:v high -level 4.0
High 4.1 iPad 2 and later, iPhone 4s and later, iPhone 5c and later -profile:v high -level 4.1
High 4.2 iPad Air and later, iPhone 5s and later -profile:v high -level 4.2

Converting to H265 (AVI to H265)

Constant Rate Factor (CRF)

When working with AVC/H264 video, for the most part, we need to work with Constant Rate Factor (CRF).

CRF is the default quality (and rate control) setting for the x264 and x265 encoders. Values range between 0 and 51, where lower values would result in better quality and larger file sizes. Higher values mean more compression, but eventually, the video quality will noticeably degrade.

Pick the CRF values

Use this mode if you want to retain good visual quality and don’t care about the exact bitrate or filesize of the encoded file. The mode works exactly the same as in x264, so please read the H.264 guide for more info.

As with x264, you need to make two choices:

  • Choose a CRF. The default is 28, and it should visually correspond to alibx264 video at CRF 23, but result in about half the file size. Other than that, CRF works just like in x264
  • Choose a preset. The default is medium. The preset determines how fast the encoding process will be – at the expense of compression efficiency. Put differently, if you choose ultrafast, the encoding process is going to run fast, but the file size will be larger when compared to medium. The visual quality will be the same. Valid presets are ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow and placebo
  • Choose a tune. By default, this is disabled, and it is generally not required to set a tune option. x265 supports the following -tune options: psnr, ssim, grain, zerolatency, fastdecode.
    For example:
ffmpeg -i input -c:v libx265 -crf 28 -c:a aac -b:a 128k output.mp4

Two-Pass Encoding

This process is very similar to the H264 two-pass encoding with some different parameters.

For two-pass, you need to run ffmpeg twice, with almost the same settings, except for:

  • In pass 1 and 2, use the -x265-params pass=1 and -x265-params pass=2 options, respectively. For libx265, the -pass option (that you would use for libx264) is not applicable
  • In pass 1, output to a null file descriptor, not an actual file. (This will generate a logfile that ffmpeg needs for the second pass)
  • In pass 1, you need to specify an output format (with -f) that matches the output format you will use in pass 2
  • In pass 1, specify the audio codec used in pass 2; in many cases, -an in pass 1 will not work

The full H265 example looks like this:

ffmpeg -y -i input -c:v libx265 -b:v 2600k -x265-params pass=1 -c:a aac -b:a 128k -f mp4 /dev/null && \
ffmpeg -i input -c:v libx265 -b:v 2600k -x265-params pass=2 -c:a aac -b:a 128k output.mp4

As with CRF, choose the slowest -preset you can tolerate, and optionally apply a -tune setting. Note that when using faster presets with the same target bitrate, the resulting quality will be lower and vice-versa.

Passing Options

Generally, options are passed to x265 with the -x265-params argument. For fine-tuning the encoding process, you can pass any option that is listed in the x265 documentation. This is only good if you know exactly what you need to change.

Setting Profiles

Currently, ffmpeg does not support setting profiles with the profile:v option, as libx264 does. However, the profile options can be set manually, as shown in this Super User post.

Links and Resources

Creating an Encoder Ladder

I’ve always thought about the concept of an encoding ladder without really knowing what it was and why it was important. I came across the concept again a few weeks ago when researching the AV1 codec.

Before jumping into details, let’s define what an encoding ladder is:

Your encoding ladder is the set of encoding parameters that you use to create the various files that you deliver adaptively to your web viewers. These encoding parameters can live in your on-premise encoder, in your cloud encoder, or in your online video platform (OVP).

Five Signs Your Encoding Ladder May Be Obsolete

So what ladder should we use?

It depends on the budget and what streams your audience is using.

Apple recommended bitrates from TN 2244

We don’t have to use all or any of the encodings.

In my research, I came across Five Views of your Encoding Ladder by Jan Ozer where he discusses encoding ladders and how to set them up.

We will discuss both an ideal ladder and a more practical ladder for web content that we want to publish.

The Ideal Ladder

For this ideal ladder, we’ll skip pricing concerns. In an ideal world, we’d take that into consideration as well.

The table below, taken from Ozer’s Linked In article, uses Apple’s Technical Note 2244 as the basis for the encoding ladder.

An ideal encoding ladder bassed on Apple”s Technical Note 2244

I’ve always wondered how much we can cut off the bottom end for our American users. I’m thinking about removing either 234p, 270p or both depending on our target audience.

I’m also questioning whether we need data rates over 4500. Unless we can derive actual differences between the 4500 and higher data rates I don’t think we need to encode to the higher rates.

This is a good starting point but you’ll have to do a lot of testing with your content and with potential delivery channels before you can pronounce it good and use it to deploy content.

An Optimized Ladder for Web Delivery

Working with video for web delivery can be done either of two ways, we can use a video delivery network or we can upload our content to different providers for delivery.

I’ve chosen to work with these providers:

  • Youtube
  • Vimeo
  • Twitch

Although Twitch was initially geared towards computer gamers’ broadcast my experience has been vastly different in the kind of games that first attracted me to earth.

The World According to Twitch: Winning With Experimentation covers the gaming aspect of the platform along with some explorations of what other content can be used in the platform.

GVC Usage Content
GVC Usage Content

The important part of that article, to me, is what it doesn’t cover. For some people, it wasn’t gaming broadcast that attracted people to the platform. Channels like Geek and Sundry were the first Twitch Streams that I became interested in. Later Sean Larkin and Jim Lee that I really saw the potential in the platform. Larkin and Lee both provide some gaming content but their primary content is non-gaming related, Larkin produces Webpack-related software content and Lee creates sketches and drawings of Image and DC characters.

So coming back to the encoding ladder for these platforms we can look at what the requirement for each platform is.

Below I’m providing a condensed version of my Encoding Spreadsheet that I compiled to try and figure out what the encoding ladder should be like

Provider Size Aspect Ratio Container Rate (FPS)
Youtube 4k 3840×2160 16 x 9 MP4 25
Youtube 2k 2560×1440 16 x 9 MP4 25
Youtube 1080p 1920×1080 16 x 9 MP4 25
Youtube 720p 1280×720 16 x 9 MP4 25
Youtube 480p 854×480 16 x 9 MP4 25
Youtube 360p 640×360 16 x 9 MP4 25
Vimeo 4k 3840×2160 16 x 9 MP4 25
Vimeo 2k 2560×1440 16 x 9 MP4 25
Vimeo 1080p 1920×1080 16 x 9 MP4 25
Vimeo 720p 1280 × 720 16 x 9 MP4 25
Vimeo SD 640 × 360 16 x 9 MP4 25
Twitch 1080p 1920×1080 16 x 9 MP4 25
Twitch 720p 1280×720 16 x 9 MP4 25
Twitch 480p 854×480 16 x 9 MP4 25
Compiled data about encoding requirements for video on demand clips for Youtube, Vimeo and Twitch.

Looking at the table and their resources on compression:

We can see the commonalities. For the most part, we should be able to create one file to push to Youtube and Vimeo and one additional stream, if needed or wanted, for Twitch.

Creating files with FFMPEG

Using the FFMPEG command line utility we’ll do several passes at converting the video to the format we want.

In the first pass, we’ll worry about bitrate and framerate for a 360p version of the video. FFMPEG’s default frame rate is 25 so we can leave it out to make the code simpler or add it to make sure we account for it if/when the default changes.

The initial command to create a 360p version of the video at 25fps looks like this:

ffmpeg -i Agent_327_Operation_Barbershop.mkv -b:v 2m -r 25 agent_327_360p.mp4

Next, we’ll make sure that the video size is correct. for this we’ll use FFMPEG’s -s attribute along with the size that we want to use, in this case, the full attribute will be -s 640x480. The modified command looks like this:

ffmpeg -i Agent_327_Operation_Barbershop.mkv -b:v 2m -r 25 -s 640x480 agent_327_360p_sized.mp4

The following step is to make sure the resulting video clip keeps a 16:9 aspect ratio. For this we use the -aspect parameter with the value we want the resulting video to have. The full parameter is -aspect 16x9.

ffmpeg -i Agent_327_Operation_Barbershop.mkv -b:v 2m -r 25 -s 640x480 -aspect 16:9 \

Youtube doesn’t like interlaced video so we’ll deinterlace it ourselves rather than leave it in the vendor’s hands with unpredictable results.

ffmpeg -i Agent_327_Operation_Barbershop.mkv -b:v 2m -r 25 -s 640x480 -aspect 16:9 -deinterlace \

I think that covers the basic work with video, now let’s look at the audio portion of the clip. The two attributes that we want to work with for audio are -ar and -ab.

The -ar attribute controls the audio sampling frequency. Both Youtube and Vimeo want you to leave it at 48khz so we set the value to 48000. Note that for any audio uploaded with a sampling rate over 48kHz, Vimeo will resample your audio to 48 kHz or below.

With -ab we get control of the audio bit rate (expressed in bits per second). Here, again, we let Vimeo drive the process as they require a constant rate of 320 kbits per second so we set it to 320k. However, depending on the type of source material you’re working with it may be possible to lower this value without losing quality.

ffmpeg -i Agent_327_Operation_Barbershop.mkv -b:v 2m -r 25 -s 640x480 -aspect 16:9 -deinterlace \
-ar 48000 -ab 320k \

Using the 360p video as a reference we can work the other versions (1080p and 720p) using the command as a reference. The parameters we need to change for each version are:

  • -b:v 2m to change the video data rate
  • -s 640x480 to change the video dimensions

We can go even further and create a script in either Python or Ruby that will automate the changes. We only need one command for each version that we want to create


If you have the storage and the bandwidth you may also want to play with converting your videos into DASH content and serve them directly through your web server. In Revisitng Video Encoding: DASH we did this using Google’s Shaka Packager for creating the DASH streams and the companion Shaka Player to play the content.

Creating the manifest

For this portion, we’ll assume that we’ve created 3 files: one for each of 360p, 720p, and 1080p resolutions. For each of the resolutions, we’ll create both an audio and a video stream and write all those streams to a manifest file, in this case, agent327.mpd.

Note that I’m working on MacOS and the packager is installed on my path, that’s why I can get away with just running packager for the command, otherwise I would have to specify the full path to the application.

The full command looks like this:

packager \
input=agent_327_360p.mp4,stream=audio,output=360p_audio.mp4 \
input=agent_327_360p.mp4,stream=video,output=360p_video.mp4 \
input=agent_327_720p.mp4,stream=audio,output=720p_audio.mp4 \
input=agent_327_720p.mp4,stream=video,output=720p_video.mp4 \
input=agent_327_1080p.mp4,stream=audio,output=1080p_audio.mp4 \
input=agent_327_1080p.mp4,stream=video,output=1080p_video.mp4 \
--mpd_output agent327.mpd

We will use the manifest as the source of the video we want to play. We will use the Shaka Player to play the video on the web page.

Playing DASH content

Shaka Player is the playback component of the Shaka ecosystem. Also developed by Google and open sourced on Github.

If you’re used to HTML5 the way you add DASH video is a little more complicated than you’re used to.

First, we create a simple HTML page with a video element. In this page we make sure that we add the scripts we need:

  • The shaka-player script
  • The script for our application

The video element is incomplete on purpose. We will add the rest of the video in the script later on.

<!DOCTYPE html>
    <!-- Shaka Player compiled library: -->
    <script src="path/to/shaka-player.compiled.js"></script>
    <!-- Your application source: -->
    <script src="video.js"></script>
    <video id="video"
           controls autoplay></video>

We’ll break the script into three parts:

  • Application init
  • Player init
  • Error handler and event listener

We initialize the application by installing the polyfills built into the Shaka player to make sure that all the supported players behave the same way and that there won’t be any unexpected surprises later on.

The next step is to check if the browser is supported using the built-in isBrowserSupported check. If the browser supports DASH then we initialize the player by calling initPlayer() otherwise we log the error to console.

var manifestUri = 'media/example.mpd';

function initApp() {
  // Install built-in polyfills to patch browser incompatibilities.

  // Check to see if the browser supports the basic APIs Shaka needs.
  if (shaka.Player.isBrowserSupported()) {
    // Everything looks good!
  } else {
    // This browser does not have the minimum set of APIs we need.
    console.error('Browser not supported!');

Initializing the player is the meat of the process and will take serval different steps.

We create variables to capture the video element using getElementById and the player by assigning a new instance of Shaka.Player and attach it to the video element.

We then attach the player to the window object to make it easier to access the console.

Next, we attach the error event handler to the onErrorEvent function defined later in the script. Positioning doesn’t matter as far as Javascript is concerned.

The last step in this function is to try and load a manifest using a promise. If the promise succeeds then we log it to console otherwise the catch tree of the promise chain is executed and runs the onError function (which is different than onErrorEvent discussed earlier).

function initPlayer() {
  // Create a Player instance.
  var video = document.getElementById('video');
  var player = new shaka.Player(video);

  // Attach player to the window to make it easy to access in the JS console.
  window.player = player;

  // Listen for error events.
  player.addEventListener('error', onErrorEvent);

  // Try to load a manifest.
  // This is an asynchronous process.
  player.load(manifestUri).then(function() {
    // This runs if the asynchronous load is successful.
    console.log('The video has now been loaded!');
  }).catch(onError);  // onError is executed if the asynchronous load fails.

The last part of the script is to create the functions for errors (onErrorEvent and onError)

Finally we attach the initApp() function to the DOMContentLoaded event.

function onErrorEvent(event) {
  // Extract the shaka.util.Error object from the event.

function onError(error) {
  // Log the error.
  console.error('Error code', error.code, 'object', error);

document.addEventListener('DOMContentLoaded', initApp);

If everything works out OK we should have a video playing on the screen and the video will switch to the most appropriate version for your hardware and network conditions.

Quick Note: AOM and AV1

Something to keep an eye out for regarding royalty-free video codexes is AV1 from the Streaming Media Alliance. While it’s still under development with a final code freeze expected by 12/31/17 it shows promise of doing better than HEVC and not having any of the royalty nightmares from HEVC patent pools.

Before diving into AV1, let’s take a look at the organization building it, the Alliance for Open Media. In What is AV1? Jan Ozer writes:

The Alliance for Open Media was announced on September 1, 2015, with charter members Amazon, Cisco, Google, Intel Corporation, Microsoft, Mozilla, and Netflix. At the time, the formation consolidated the development of three potentially competitive open source codecs; Cisco’s Thor, Google’s VP10, and Mozilla’s Daala. According to the initial press release, the goal was to produce a “next-generation video format” that is:

  • Interoperable and open
  • Optimized for the web
  • Scalable to any modern device at any bandwidth
  • Designed with a low computational footprint and optimized for hardware
  • Capable of consistent, highest-quality, real-time video delivery
  • Flexible for both commercial and non-commercial content, including user-generated content.

According to the AOM license terms, all licensors receive a “perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as expressly stated in this License) patent license to its Necessary Claims to make, use, sell, offer for sale, import or distribute any Implementation.” There is no requirement for licensees to disclose any of their own code. The Alliance will pursue all codec developments via an open source repository. [emphasis mine]

On April 5, 2016, the Alliance announced that IP provider ARM and semiconductor companies AMD and NVIDIA joined the Alliance to help ensure that the codec was hardware-friendly, and to facilitate and accelerate AV1 hardware support.

In terms of makeup, the Alliance members enjoy leading positions in the following markets:

  • Codec development – Cisco (Thor) Google (VPX), Mozilla (Daala)
  • Desktop and mobile browsers – Google (Chrome), Mozilla (Firefox), Microsoft (Edge)
  • Content – Amazon (Prime), Google (YouTube), Netflix
  • Hardware co-processing – AMD (CPUs, graphics), ARM (SoCs, other chips), Intel (CPUs), NVIDIA (SoC, GPUs)
  • Mobile – Google (Android), Microsoft (Windows Phone)
  • OTT – Amazon (Amazon Fire TV), Google (Chromecast, Android TV)

So the members of the alliance are positioned to make AV2 adoption much easier than it normally is. The members are clearly leaders in their respective areas and can make things happen faster than they would otherwise.

Last September Ozer provided a status update on AV1 which did several things:

  • It confirmed the December 31 bitstream freeze date. So hardware and software development on top of the spec can begin after the freeze if it hasn’t already
  • That participant expect at least a 20% improvement over HEVC before they consider the code ready for production
  • Encoding complexity should be no greater than 5x HEVC’s encoding complexity
  • There hasn’t been much information on the legal front. The Alliance for Open Media: The Latest Challenge to Patent Pools presents an interesting analysis of the legal ramifications and possibilities for other industries to fight against patent pools in their areas

Around this time Bitmovin announced they are joining the Alliance and produced the first AV1-based live encoding. They have also produced, in collaboration with Mozilla, a demo of DASH delivered, AV1 encoded video via Firefox. The demo requires Firefox Nightly and will not work with any other browser.

In THEOplayer’s What is AV1? Past, Present & Future the authors write that:

Whether AV1 will deliver the promised performance, we will have to see when the bitstream is finalized and experts start talking about their experience with the codec. As for the legal matters surrounding the codec, it is something to keep in mind, but it most likely will not create any significant obstacles. The real question is how quickly the codec gets adopted, both by device manufacturers but also by video services who need to upgrade or replace their existing streaming infrastructure.

In short, surely keep your eyes and ears open for news and performance rapports about AV1, but for now, depending on your viewer market and their devices, it might be wise to stick to H.264, go with VP9 or H.265 or even look at alternatives such as V-Nova’s Perseus codec.

Quick Note: HEVC and Apple. What does it mean?

Apple announced support for HEIF and HEVC high-performance video and image formats in macOS High Sierra at WWDC in 2017. While you can create content and share it free of charge all other users require content producers to license content for distribution.

Unlike h264 there are multiple patent pools for HEVC, each with its own royalty schemes and cost for developers and, eventually, end users.

The HEVC patent pools I’m aware of are:

Other patent holders are not joining pools and want to license their technologies individually

I know that Apple has licensed the technology but I have no idea what the terms for Apple’s license are, if any other OS vendor has licensed the technology and whether the h265 open source codec available for tools like FFMPEG are completely free of patent encumbrance.

Unlike the AVC/h264 codec, if MPEG-LA promises not to charge a royalty for projects that don’t charge users, it will not eliminate the risk to users and content creators, only reduce it.

As much as I like HEVC as a technology, the licensing conundrum makes it hard for me to adopt it for work-related projects. How expensive is the license? how many people do I have to pay for each? What exceptions are there? If one licensing pool decides not to charge royalties for some part of their pool, do we still have to pay the others?

As usual, I’m not a lawyer and this should not be considered legal advice, just a heads up of what minefields may lay ahead.

Quick Note: Video Containers

I was working on reencoding videos to HEVC from DVD content I had ripped off before I lost access to a DVD player on my laptop. Yay, Apple. I couldn’t understand why the captions (in .ass format) would not render with the formatting they were created with.

A quick jump on the FFMPEG web chat made it clear that it was a container issue, not a caption or re-encoding one.

Not all container formats support the same feature sets.

Wikipedia’s Comparison of Video Containers gives you an idea of the breadth and depth of the container fields out there.

This is different than the codecs that you use for audio and video. Both my favorite containers, Matroska with a mkv extension and MPEG4, with an mp4 extension both support hevc for video and acc for audio. The big difference, at least according to people in the FFMPEG chat is that Matroska can handle captions where MPEG4 cannot.

Once again I’m reminded that whenever we work with video we need to be careful with formats, encoding ladders, containers and a lot of other items before our users see any of the content we are working with.