Revisiting images formats for the web

July 15, 2020
10 min. read

Every so often I see comparisons between image formats that say one format is better than others or that one format is better for a given task but I’ve always wondered where the numbers came from and what testing criteria were used. Rather than take things at face value, I want to make sure that whatever decision I make it is backed up with data. I’ve put all the files and scripts on a Github repository for you to run the same tests and see if the results match mine. Beware that the TIFF images are very large and may take a large chunk of your data plan if you download them on mobile.

This post does not cover HEIC/HEIF and AVIF image formats. To cover those two formats well, I need more time to compile and test the tools and want to make sure that I don’t mix tool research with image quality. I will post the results of any research on those two formats in a separate post.

Before we jump into looking at the quality of compressed images, let's take a look at what's available and what's coming down the pipeline as far as image formats for the web.

Format	Initial Release	Open?	Type	Available Encoder	Encoder to use	Notes
GIF	1987	No	Lossless	Yes	ImageMagick
JPG	1992	Yes	Lossy	Yes	ImageMagick	According to Wikipedia patents for JPEG technologies expired
PNG	1996	Yes	Lossless	Yes	ImageMagick	Also Provides animation support. Check caniuse.com for supported browsers
WebP	2010	Yes	Lossless and Lossy	Yes	ImageMagick or directly with cwebp	Based of WebM video compression

The process #

As Kornel Lesinski writes in How to compare images fairly:

Absolutely the worst way to compare images is to convert one lossy format to another and conclude you “can't see the difference”. Why is it bad? Save a photo as a couple JPEGs at quality=98 and quality=92. It will be hard to tell them apart, but their file sizes will differ by nearly 40%! Does it prove that JPEG is 40% better than… itself? No, it shows that “quality appears the same, but the file much is smaller!” can easily be nonsense that proves nothing. To make a fair comparison you really have to pay meticulous attention to encoder settings, normalizing quality, and ensuring that compared images are in fact comparable. It's really hard to make a fair comparison.

Before we even start encoding the images we have to do a few things:

Find a lossless, high-quality image to use as the source for the exercises
Decide what tools we will use to encode the images. Sometimes this may be decided for you as there may not be many tools for the newer formats
Decide what criteria you will use for your testing and how will you measure it
Will you measure objective quality using tools like DSSIM?
Will you compare file size for a given quality?
How will you decide which metric is more important?
Make sure that the tools produce similar output. For example, all formats should use chroma subsampling or none should
Figure out what are the equivalent settings for the formats that you're testing. Q=80 for a JPEG image may not be the same as Q=80 for other formats
Test all formats at the same or equivalent quality
Make sure that all format encoders can work from the same source
If the format offers lossless and lossy compression use lossy to match what JPEG does

I chose to work with different images in TIFF format. Information about the specific images is listed below:

Images from Los Lajones Estate downloaded from their website
Image of the USS California from Wikimedia Commons is in the public domain
Images from Hubblesite taken from bat shadow and NGC 6302: The "Butterfly Nebula" are in the public domain as stated in the hubblesite copyright page

We'll use different encoders for different formats, below is the list of formats with their associated image encoders. All the binaries were reinstalled to make sure I have the latest versions available via Homebrew as of this writing:

PNG: ImageMagick
JPG: ImageMagick
WebP: cwebp

First Test: Equal Quality to measure file size #

The first test I wanted to run is what happens if we encode a TIFF source image to all image formats that we can test with the same quality value, in this case, 80. It is important to note that a JPEG image encoded at 80 quality is not the same as a lossy WebP image encoded at the same quality. We do it this way because it’s the easiest way to test and it’s what I would do in Photoshop of when running image compression with tools like imagemin

The questions that I want to answer with this test:

Keeping quality constant, what lossless format provides the smaller file size?

Rather than type the command every time that I run the test, and to make the results reproducible, I created the Bash script below

#! /usr/bin/env bash

# Variable holding name of source image.
SOURCE_IMAGE='STSCI-H-p2022a-f-4398x3982'

# Variables holding names of
# encoders' binaries
IMAGE_MAGICK='convert'
WEBP_ENCODER='cwebp'
HEIC_ENCODER='heif-enc'

echo Starting First Encoding Test

if hash ${IMAGE_MAGICK} 2>/dev/null; then
  echo encoding to PNG
  ${IMAGE_MAGICK} ${SOURCE_IMAGE}.tif \
  -quality 80 ${SOURCE_IMAGE}.png
  echo encoding to JPG
  ${IMAGE_MAGICK} ${SOURCE_IMAGE}.tif \
  -quality 80 ${SOURCE_IMAGE}.jpg
else
  echo cannot convert to PNG or JPG
fi

if hash ${WEBP_ENCODER} 2>/dev/null; then
  echo encoding to lossy WebP
  ${WEBP_ENCODER} -q 80 \
  ${SOURCE_IMAGE}.tif \
  -o ${SOURCE_IMAGE}.webp
else
  echo cannot convert to WEBP
fi

if hash ${HEIC_ENCODER} 2>/dev/null; then
  echo encoding to lossy HEIC
  ${HEIC_ENCODER} --quality 80 \
  ${SOURCE_IMAGE}.png
else
  echo could not encode to HEIC
fi

My Results with images encoded from TIF high-quality sources and JPG where TIF was not an option:

Format	File Size
TIFF (base)	15MB
PNG	13.9MB
JPG	855KB
WebP	266KB

There is a lot of research and tweaking to obtain optimal results.

So in the naïve, all quality is the same test, WebP wins by a lot. remember that image quality is not a straight equivalency across formats as explained earlier.

Finding the optimal quality #

I know that optimal quality depends on the type of image and the screens we're working with, but an initial step on determining our optimal quality may be to establish what are the best compression settings for each format. We're likely to be serving at least two with our sources or srcset images. To answer this question we'll do a two-step process:

We create a set of WebP and a set of JPG images with quality ranging from 50 to 100
We'll use SSIM to provide an objective metric to use in comparing the images.

We then analyze the SSIM results to decide which of the compressed images gives us the best combination of quality measured by SSIM against a 100 quality compressed PNG image (for some reason ImageMagick's compare command will not work against a TIFF source image) and file size.

The Scripts #

Each step of the process uses its own script.

The first scripts generate multiple images. It is essentially the same script that we used for the previous evaluation, except that we have to pass the name of the image, without extension, as a parameter when we invoke the script.

if hash ${WEBP_ENCODER} 2>/dev/null; then
  for i in {50..100..10}
    do
      echo encoding to lossy WebP at ${i} quality
      ${WEBP_ENCODER} -q ${i} \
      ${SOURCE_IMAGE}.tif \
      -o ${SOURCE_IMAGE}-${i}.webp
  done
else
  echo cannot convert to WEBP
fi

The second script does the comparison. I would much rather use a direct SSIM encoder rather than using ImageMagick's compare command but none of the available encoders works as well as I would like.

So we use magick compare as our comparison tool and we run it against every WebP and JPG image we created using the 100 quality PNG as our standard to compare against.

I'd rather compare against the TIFF image but IM errors out with that comparison so, in my opinion, PNG is the next best available testing option.

This code will only work on Bash shells version 4 and higher.

SOURCE_IMAGE=$1
IMAGE_MAGICK_COMPARE='magick compare'

echo starting to work with SSIM comparison

if [ -f ${SOURCE_IMAGE}-100.png ]; then
  echo ${SOURCE_IMAGE}-100.png exits
  for i in {50..100..10}
    do
      echo "running comparisons for webp at ${i} quality"
      ${IMAGE_MAGICK_COMPARE} -metric ssim \
      ${SOURCE_IMAGE}-100.png \
      ${SOURCE_IMAGE}-${i}.webp \
      null:
  done

  for i in {50..100..10}
    do
      echo "running comparisons for jpg at ${i} quality"
      ${IMAGE_MAGICK_COMPARE} -metric ssim \
      ${SOURCE_IMAGE}-100.png \
      ${SOURCE_IMAGE}-${i}.jpg \
      null:
  done
else
  echo can\'t run the WebP comparison
fi

The Results #

The first table uses STSCI-H-p2022a-f-4398x3982 from the Nasa image library. The image is 14MB and may use a significant portion of your data plan on mobile.

The file sizes are generally what I expected and the differences between the SSIM values are similar enough to make file size becomes the primary consideration.

Quality	WebP File Size	WebP SSIM Value	JPG File Size	JPG SSIM
100	2.1MB	0.986584	10.9M	0.992733
90	639KB	0.981029	3.6MB	0.985442
80	266KB	0.975904	2.1MB	0.982087
70	183KB	0.973957	1.5MB	0.978859
60	153KB	0.973044	1.1MB	0.974777
50	128KB	0.972133	864KB	0.972219

The second table uses geisha-high-resas the image to test. This is a much brighter and deep contrast color image. The image is 11.9MB and may use a significant portion of your data plan on mobile.

Quality	WebP File Size	WebP SSIM Value	JPG File Size	JPG SSIM
100	2.6MB	0.980442	5.2MB	0.992432
90	757KB	0.961136	1.4MB	0.966931
80	300KB	0.947507	788KB	0.956424
70	217KB	0.943466	535KB	0.949292
60	188KB	0.941836	407KB	0.942214
50	166KB	0.939598	332KB	0.937974

The final example, the USS California image presents some interesting variance for analysis. The image is 12.8MB and may use a significant portion of your data plan on mobile.

I ran the compression and the SSIM comparison in separate steps, so I made the incorrect assumption that grayscale WebP images would exhibit the same behavior as color ones where the WebP files scored better in the SSIM metric across the board instead of fluctuating as they did. Need to do more research, particularly if it has to do with encoding settings on the WebP size and whether the -jpeg_like and -shap_yuv flags would change the results in any way.

Quality	WebP File Size	WebP SSIM Value	JPG File Size	JPG SSIM
100	10.1MB	1	11.2MB	0.998955
90	3MB	0.956338	3.6MB	0.951193
80	1.1MB	0.909765	2.1MB	0.930837
70	669KB	0.891505	1.5MB	0.91801
60	530KB	0.88458	1.1MB	0.907007
50	426KB	0.877016	864MB	0.898054

What's missing #

Two additional image formats should be in the encoding tests but are not.

HEIC is an image format based on the HEVC video code. The only encoder I found for the format produced significantly larger sizes than any other formats. I need to run additional tests to make sure I'm encoding it as it should be and that the larger sizes are not a result of my doing it wrong.

AVIF is the image format based on the open-source AV1 video format. The same encoder that creates HEIC files (lib-heif) will, supposedly, create AVIF files when running with a flag. I have yet to get it to work.

I was able to install the libavif reference implementation but I'll take a little time to document the process and then set up another test to see how heic and avif compare to WebP, JPG, and PNG.

Conclusions #

There is additional work that needs to happen regarding format-specific configuration and encoding flags and test them to see if the formats are affected by the flags you use when doing the compression.

Depending on the type of images you use on your site or project, evaluating the file sizes of a representative number of images is always a good idea.

Thanks to Jeremy Wagner for proofreading and giving technical feedback on the post.

Edit on Github

Image formats for the web: HEIC and AVIF Customizing WordPress