Balancing content with Flexbox

One of the cool things we can do with Flexbox is to balance the text and images as if they were in a two-cell table. It should be possible to do so with images but instead we’ll simulate two cells with the code below. I normally don’t do this but in this case I will use CSS to populate the first div element with the appropriate image.

<div class="column">
  <figure class="flex">
    <div></div>
    <div>
      <h3>Chrome Canary</h3>

      <p>I install both Canary and Release versions to 
      make sure that the code I'm working on works in 
      my target browsers</p>
      <p>I install both Canary and Release versions to 
      make sure that the code I'm working on works in 
      my target browsers</p>
      <p>I install both Canary and Release versions to 
      make sure that the code I'm working on works in 
      my target browsers</p>

    </div>
  </figure>
</div>

The CSS is where all the magic happens. I’ve broken in into different sections. In the first section we define our layout. In particular:

  • We define the element with class flex to have display flex
  • In odd children we change the default mode for flex to display elements in reverse order. This will display the image on the right side and the text on the left
  • The first div child uses background attributes to manipulate the image. This is not really doable with images inserted using the img tag
  • The last div child will take twice the space of the first one
.flex {
    margin: 0;
    display: flex;
    border: 5px solid #333;
    margin-bottom: 2rem;
}

.flex:nth-child(odd) {
    flex-direction: row-reverse;
}

.flex div:first-child {
    flex: 1;
    background-size: cover;
    background-position: center;
}

.flex div:last-child {
    margin: 2rem;
    flex: 2;
}

In the second block of CSS we do some formatting for the text content of each section. The last paragraph, .flex p:last-of-type has an additional rule to eliminate the bottom margin; this makes sure the empty bottom margin of that element doesn’t add to the total height of the text.

.flex h3 {
    font-size: 1.5rem;
    margin-top: 0;
    font-weight: 400;
}
.flex p {
    font-size: 1rem;
    line-height: 1.4;
    font-weight: 400;
}

.flex p:last-of-type {
    margin-bottom: 0;
}

This section adds the images as background images to the empty first div of each figure. I don’t particularly like using background images because they make it harder to share and to work with outside of CSS.

For this kind of project working with images using the img tag on the page doesn’t produce the same effect. Using background-size: cover is different than making the image fluid using percentages for width.

For each of the children of .flex we add a background image to the first div children. It can be the same image or a different one like we’ve done in this case.

.flex:nth-child(1) div:first-child {
    background-image: url("images/chrome-canary_128x128.png");
}

.flex:nth-child(2) div:first-child {
    background-image: url("images/chrome_128x128.png");
}

.flex:nth-child(3) div:first-child {
    background-image: url("images/firefox-developer-edition_128x128.png");
}

.flex:nth-child(4) div:first-child {
    background-image: url("images/firefox_128x128.png");
}

The final section of our CSS is a media query to accommodate smaller form factors and avoid the image looking ugly on iPhones and other smaller form factors. We accomplish this by changing the layout from horizontal to vertical (flex-direction changes to columns).

    @media screen and (max-width: 600px) {
        .flex { flex-direction: column; }
        .flex div:first-child { min-height: 200px; }

        .flex:nth-child(odd) {
            flex-direction: column;
        }
    }

The idea is to create a consistent layout for images and text. We can use this as the index page for a magazine or the starting point of additional experiments using Flexbox beyond gallery displays.

Hat tip to Dudley Storey for the original idea.

Asm.js and Web Assembly

I’ve been hearing about Web Assembly and its predecessor, ASM.js for a while. The idea is that we can bring C and C++ code into the web and use it directly on the browser without having plugins get in the way. This would also make it easier to port high end games and other C/C++ code to Javascript and leverage existing APIs and features

Asm.js

Asm.js is a subset of JavaScript that is heavily restricted in what it can do and how it can operate. This is done so that the compiled Asm.js code can run as fast as possible making as few assumptions as it can, converting the Asm.js code directly into assembly. It’s important to note that Asm.js is just JavaScript – there is no special browser plugin or feature needed in order to make it work (although a browser that is able to detect and optimize Asm.js code will certainly run faster). It’s a specialized subset of JavaScript that’s optimized for performance, especially for this use case of applications compiled to JavaScript.

Asm.js: The JavaScript Compile Target

The first attempt at using Javascript as a target language for cross compilation is asm.js. Using Emscripten asm.js allowed developers to compile massive C/C++ code bases to Javascript that that ran natively in the browser and leverages web technologies and APIs like WebGL being able to port games created with Unity and Unreal engine directly to the web like the Unreal demo below, circa 2013

The process can be illustrated with the diagram below (taken from ejohn.org):

ASM.js compilation and execution pipeline from ejohn.org

The code is not meant to be written, or read, by humans. The example below was created by John Ressig to demonstrate the differences between asm.js and the regular Javascript code developers normally work with. The code has been formatted for clarity and sanity preservation, standard asm.js is heavily minified into one continuous blog of text.

function Vb(<span class="hiddenGrammarError" pre="">d) {
    d</span> = d | 0;
    var e = 0, f = 0, h = 0, j = 0, k = 0, l = 0, m = 0, n = 0,
        o = 0, p = 0, q = 0, r = 0, s = 0;
    e = i;
    i</span> = i + 12 | 0;
    f = e | 0;
    h = d + 12 | 0;
    j = c[h >> 2] | 0;
    if ((j | 0) > 0) {
        c[h >> 2] = 0;
        k = 0
    } else {
        k = j
    }
    j = d + 24 | 0;
    if ((c[j >> 2] | 0) > 0) {
        c[j >> 2] = 0
    }
    l = d + 28 | 0;
    c[l >> 2] = 0;
    c[l + 4 >> 2] = 0;
    l = (c[1384465] | 0) + 3 | 0;
    do {
        if (l >>> 0 < 26) {
            if ((4980736 >>> (l >>> 0) & 1 | 0) == 0) {
                break
            }
            if ((c[1356579] | 0) > 0) {
                m = d + 4 | 0;
                n = 0;
                while (1) {
                    o = c[(c[1356577] | 0) + (n << 2) >> 2] | 0;
                    do {
                        if (a[o + 22 | 0] << 24 >> 24 == 24) {
                            if (!(Vp(d, o | 0) | 0)) {
                                break
                            }
                            p = (c[m >> 2] | 0) + (((c[h >> 2] | 0) - 1 | 0) * 40 & -1) + 12 | 0;
                            q = o + 28 | 0;
                            c[p >> 2] = c[q >> 2] | 0;
                            c[p + 4 >> 2] = c[q + 4 >> 2] | 0;
                            c[p + 8 >> 2] = c[q + 8 >> 2] | 0;
                            c[p + 12 >> 2] = c[q + 12 >> 2] | 0;
                            c[p + 16 >> 2] = c[q + 16 >> 2] | 0;
                            c[p + 20 >> 2] = c[q + 20 >> 2] | 0;
                            c[p + 24 >> 2] = c[q + 24 >> 2] | 0
                        }
                    } while (0);
                    o = n + 1 | 0;
                    if ((o | 0) < (c[1356579] | 0)) {
                        n = o
                    } else {
                        break
                    }
                }
                r = c[h >> 2] | 0
            } else {
                r = k
            } if ((r | 0) == 0) {
                i = e;
                return
            }
            n = c[j >> 2] | 0;
            if ((n | 0) >= 1) {
                i = e;
                return
            }
            m = f | 0;
            o = f + 4 | 0;
            q = f + 8 | 0;
            p = n;
            while (1) {
                g[m >> 2] = 0.0;
                g[o >> 2] = 0.0;
                g[q >> 2] = 0.0;
                Vq(d, p, f, 0, -1e3);
                n = c[j >> 2] | 0;
                if ((n | 0) < 1) {
                    p = n
                } else {
                    break
                }
            }
            i = e;
            return
        }
    } while (0);
    if ((c[1356579] | 0) <= 0) {
        i = e;
        return
    }
    f = d + 16 | 0;
    r = 0;
    while (1) {
        k = c[(c[1356577] | 0) + (r << 2) >> 2] | 0;
        do {
            if (a[k + 22 | 0] << 24 >> 24 == 30) {
                h = b[k + 14 >> 1] | 0;
                if ((h - 1 & 65535) > 1) {
                    break
                }
                l = c[j >> 2] | 0;
                p = (c[1384465] | 0) + 3 | 0;
                if (p >>> 0 < 26) {
                    s = (2293760 >>> (p >>> 0) & 1 | 0) != 0 ? 0 : -1e3
                } else {
                    s = -1e3
                } if (!(Vq(d, l, k | 0, h << 16 >> 16, s) | 0)) {
                    break
                }
                g[(c[f >> 2] | 0) + (l * 112 & -1) + 56 >> 2] = +(b[k + 12 >> 1] << 16 >> 16 | 0);
                h = (c[f >> 2] | 0) + (l * 112 & -1) + 60 | 0;
                l = k + 28 | 0;
                c[h >> 2] = c[l >> 2] | 0;
                c[h + 4 >> 2] = c[l + 4 >> 2] | 0;
                c[h + 8 >> 2] = c[l + 8 >> 2] | 0;
                c[h + 12 >> 2] = c[l + 12 >> 2] | 0;
                c[h + 16 >> 2] = c[l + 16 >> 2] | 0;
                c[h + 20 >> 2] = c[l + 20 >> 2] | 0;
                c[h + 24 >> 2] = c[l + 24 >> 2] | 0
            }
        } while (0);
        k = r + 1 | 0;
        if ((k | 0) < (c[1356579] | 0)) {
            r = k
        } else {
            break
        }
    }
    i = e;
    return
}

Handwritten asm.js code is marginally easier to understand. the example below, taken from the asm.js specification shows what it looks like when we write asm.js code by hand

function DiagModule(stdlib, foreign, heap) {
  "use asm";

  // Variable Declarations
  var sqrt = stdlib.Math.sqrt;

  // Function Declarations
  function square(x) {
      x = +x;
      return +(x*x);
  }

  function diag(x, y) {
      x = +x;
      y = +y;
      return +sqrt(square(x) + square(y));
  }

  return { diag: diag };
}

Am asm.js module is contained within a function and starts with the "use asm"; directive at the top. This tells the interpreter that everything inside the function should be handled as asm.js and be compiled to assembly directly without going through the regular Javascript interpreter / optimization cycles.

Note the three arguments for the asm.js function: stdlib, foreign, and heap.

  • The stdlib object contains references to a number of built-in math functions
  • foreign provides access to custom user-defined functionality, such as drawing a shape in WebGL
  • heap gives you an ArrayBuffer which can be viewed through a number of different lenses, such as Int32Array and Float32Array.

The rest of the module is broken up into three parts: variable declarations, function declarations, and finally an object exporting the functions to expose to the user.

The export is an essential point to understand. It allows all of the code within the module to be handled as asm.js but still be available to other, normal, JavaScript code. You could, theoretically, have some code that looks like the following, using the above DiagModule code:

document.body.onclick = function() {
  function DiagModule(stdlib){"use asm"; ... return { ... };}

  var diag = DiagModule({ Math: Math }).diag;
  alert(diag(10, 100));
};

This would result in an asm.js DiagModule that’s handled special by the JavaScript interpreter but still made available to other JavaScript code that could still access it and use it within a click handler.

The result is that, within limits, we can bring C and C++ content directly into the web platform. Games and other large codebases written in C and C++ can be ported to work on the web.
Mozilla Hacks’ Porting to Emscripten tells of one project’s migration to asm.js.

Web Assembly

Web Assembly is an evolution of asm.js that has taken a lot of the lessons browser vendors and implementors learned from working with asm.js. The Web Assembly generated code is binary rather than text-based but can still provide two-way interaction with native Javascript code.

Installing Emscripten SDK

To compile C/C++ code to web assembly you need to have the Emscripten SDK installed on your system. I will only cover installing the portable SDK as it works across platforms and doesn’t require administrator privileges in any platform (Windoes, Mac or Linud). For other installation methods check

The Portable Emscripten SDK is a no-installer version of the SDK package. It is identical to the NSIS installer, except that it does not interact with the Windows registry. This allows Emscripten to be used on a computer without administrative privileges, and means that the installation can be migrated from one location (directory or computer) to another by simply copying the directory contents to the new location.

First check the Platform-specific notes below (or online) and install any prerequisites.

Install or update the SDK using the following steps:

Download and unzip the portable SDK package to a directory of your choice. This directory will contain the Emscripten SDK.

Open a command prompt inside the SDK directory and run the following emsdk commands to get the latest tools from Github and set them as active:

# Fetch the latest registry of available tools.
./emsdk update

# Download and install the latest SDK tools.
./emsdk install latest

# Make the "latest" SDK "active"
./emsdk activate latest

Notes:

On Windows, invoke the tool with emsdk instead of ./emsdk.

Linux and Mac OS X only: Call source ./emsdk_env.sh after activate to set the system path to the active version of Emscripten.

Platform-specific notes

Mac OS X

These instructions explain how to install all the required tools. You can test whether some of these are already installed on the platform and skip those steps.

  1. Install the XCode Command Line Tools. These are a precondition for git.
    • Install XCode from the Mac OS X App Store.
    • In XCode | Preferences | Downloads, install Command Line Tools.
  2. Install git:
    • Allow installation of unsigned packages, or installing the git package won’t succeed.
    • Install XCode and the XCode Command Line Tools (should already have been done). This will provide git to the system PATH (see this stackoverflow post)
    • Download and install git directly from http://git-scm.com/
  3. Install cmake if you do not have it yet:

Linux

Pre-built binaries of tools are not available on Linux. Installing a tool will automatically clone and build that tool from the sources inside the emsdk directory.

Emsdk does not install any tools to the system, or otherwise interact with Linux package managers. All file changes are done inside the emsdk/ directory.

Set the current Emscripten path on Linux/Mac OS X

source ./emsdk_env.sh

This step is not required on Windows because calling the activate command also sets the correct system path (this is not possible on Linux due to security restrictions).

Whenever you change the location of the Portable SDK (e.g. take it to another computer), re-run the ./emsdk activate latest command (and source ./emsdk_env.sh for Linux).

Compiling an application

Now that we have the portable SDK installed we can begin working on compiling code.

We’ll run two examples. The first one will print hello, world. The second example is more complex and will produce a gradient square in WebGL, using the SDL library on the C side.

For each example we’ll compile the code and generate a webpage to make sure the code works.

#include <stdio.h>

int main() {
  printf("hello, world!\n");
  return 0;
}

To compile the code and generate the web page associated with it the command to run is:

./emcc tests/hello_world.c -o hello_word.html

I’m running the command from within the Emscripten directory. Adjust your path as needed.

#include <stdio.h>
#include <SDL/SDL.h>

#ifdef __EMSCRIPTEN__
#include <emscripten.h>
#endif

extern "C" int main(int argc, char** argv) {
  printf("hello, world!\n");

  SDL_Init(SDL_INIT_VIDEO);
  SDL_Surface *screen = SDL_SetVideoMode(256, 256, 32, SDL_SWSURFACE);

#ifdef TEST_SDL_LOCK_OPTS
  EM_ASM("SDL.defaults.copyOnLock = false; SDL.defaults.discardOnLock = true; SDL.defaults.opaqueFrontBuffer = false;");
#endif

  if (SDL_MUSTLOCK(screen)) SDL_LockSurface(screen);
  for (int i = 0; i < 256; i++) {
    for (int j = 0; j < 256; j++) {
#ifdef TEST_SDL_LOCK_OPTS
      // Alpha behaves like in the browser, so write proper opaque pixels.
      int alpha = 255;
#else
      // To emulate native behavior with blitting to screen, alpha component 
      // is ignored. Test that it is so by outputting data (and testing 
      // that it does get discarded)
      int alpha = (i+j) % 255;
#endif
      *((Uint32*)screen->pixels + i * 256 + j) = SDL_MapRGBA(screen->format, i, j, 255-i, alpha);
    }
  }
  if (SDL_MUSTLOCK(screen)) SDL_UnlockSurface(screen);
  SDL_Flip(screen); 

  printf("you should see a smoothly-colored square - no sharp lines but the square borders!\n");
  printf("and here is some text that should be HTML-friendly: amp: |&| double-quote: |\"| quote: |'| less-than, greater-than, html-like tags: |<cheez></cheez>|\nanother line.\n");

  SDL_Quit();

  return 0;
}

The second demo works the same way. The code is more complex than the hello_world example and serves as an example of what you can do with the technology, incorporating additional libraries and outputting to WebGL.

To run the compiler run the following command from the root of your Emscripten SDK:

./emcc tests/hello_world_sdl.cpp -o hello2.htm

So what do we use asm.js and Web Assembly for?

Let me start by stating this very clearly: asm.js and Web Assembly are not replacements for Javascript. They provide direct access to the underlying C libraries and functionality and are usually faster than equivalent code in Javascript.

They also bring large codebases to the browser without needing plugins or runtime environments. Unity used to have a plugin that users must install before running any Unity content. The plugin is no longer supported by the browsers and Unity games are moving to browser-based experiences. This becomes essential in mobile where installing apps is less attractive than just downloading content.

As Web Assembly matures I expect to see mixed libraries where most of the code is written in Javascript and the computationally expensive code (video compression and decompression, cryptography and others) are written in C, C++ or any other language supported by Web Assembly tools.

Exciting times indeed!

Intersection Observers: Making it easier to lazy load content

What I love about Paul Lewis’ Developer Diaries is that he points me to new technologies and better ways to work with web content. In this case (video below) he clued me into a new API: Intersection Observers.

The idea behind Intersection Observers is that we don’t really need to load content until it comes into the viewport (it’s visible in the browser’s window). We can configure the action that happens when the selected object comes into view.

The best example of Intersection Observers I can think of is to lazy load images only when the image in question appears in the viewport, not before. This will make our page load faster because only the two top-most images will load when the page loads and, we all know, images are the biggest hogs when it comes to web page payload.

The script performs the following actions

  1. Create the IntersectionObserver and bind it to the function we want it to work with
  2. For each image that we want to change
    • Add the src attribute using the value from the data-src attribute in the same element
    • Stop observing the current target
  3. Convert node list of all images with data-src attributed to array
  4. Observe each image belonging to the array defined in step 3
// Script derived from: 
// Quick introduction to the Intersection Observer API 
// by Jeremias Menichelli 
// 1. Create the IntersectionObserver and bind it to the 
// function we want it to work with
let observer = new IntersectionObserver(onChange);

function onChange(changes) {
  // 2. For each image that we want to change
  changes.forEach(change => {
    // * Add the src attribute using the value
    // from the data-src attribute in the same element
    change.target.src = change.target.dataset.src;

    // * Stop observing the current target
    observer.unobserve(change.target);
  })
}

// 5. Convert node list of all images with data-src attributed to array
const imgs = [ ...document.querySelectorAll('img[data-src]') ];

// 6. Observe each image belonging to the array above
imgs.forEach(img => observer.observe(img));

In the demo page I set the first two images to always load by setting a src attribute for the images instead of a data-src attribute to be manipulated by the script. This will ensure that the content above the fold or partially above the fold will display regardless of whether the browser supports IntersectionObsservers or not.

Browser support is spotty at best. Acording to caniuse.com only Chrome and Opera support the API out of the box, Firefox supports it behind a flag (dom.IntersectionObserver.enabled) in about:config and Edge has it under development. But to load the images in browsers that don’t support IntersectionObservers we have to do jump through a few more workarounds.

The idea is that if the browser doesn’t support Intersection Observers we load the images right away, using this API as a progressive enhancement.

We modify the script to do as follows:

  1. Convert node list of all images with data-src attributed to array
  2. Wrap the code on a feature test for IntersectionObserver
  3. Create the IntersectionObserver and bind it to the function we want it to work with
  4. For each image that we want to change
    • Add the src attribute using the value from the data-src attribute in the same element
    • Stop observing the current target
  5. Observe each image belonging to the array defined in step 3
  6. If the browser doesn’t support Interaction Observer then we load all the images right away
// 1. Convert node list of all images with 
// data-src attribute to an array
const imgs = [ ...document.querySelectorAll('img[data-src]') ];

// 2. Wrap the code on a feature test for IntersectionObserver
if ('IntersectionObserver' in window) {
  // 3. Create the IntersectionObserver and bind it to the function 
  // we want it to work with
  let observer = new IntersectionObserver(onChange);

  function onChange(changes) {
    // 4. For each image that we want to change
    changes.forEach((change) => {
      // * take image url from `data-src` attribute
      change.target.src = change.target.dataset.src;
      // * Stop observing the current target
      observer.unobserve(change.target);
    })
  }

// 5. Observe each image derived from the array above
  imgs.forEach((img) => observer.observe(img));
} else {
// 6. if the browser doesn't support Intersection Observer 
// we log to console and load images manually
  console.log('Intersection Observers not supported');
  function loadImages(imgs) {
    imgs.forEach((image) => {
      image.src = image.dataset.src;
    })
  }
  loadImages(imgs);
}

Once this API is deployed on all browsers we’ll be able to lazy load content without having to worry about the positioning or threshold of when the images appear in the viewport. Is this the only way to do it? no, it isn’t. Brian Rinaldi’s Lazy Loading Images on the Web covers how to lazy load images without using Intersection Observers.

Speech Synthesis API: you talk to the computer

Speech Recognition only works in Chrome and Opera. Firefox says it supports it but it doesn’t work and returns a very cryptic error message.

The second part of the Speech Synthesis API is recognition. Where in the prior post we used the Speech Synthesis API to have the browser talk to use when there was a problem on the form; in this post we’ll use a [demo from Mozilla](demo from Mozilla) to illustrate a potential use of speech recognition to make the browser change the background color of the page based on what the user speaks in to a microphone.

Because we’re using the user’s microphone the page/application must explicitly ask for permission and it must be granted before any of this code will work. This permission can be revoked at any time.

The HTML is simple. A place holder paragraph to hold any hints we provide the user and another paragraph to hold the output of the processes.

<h1>Speech color changer</h1>

<p class="hints"></p>
<div>
    <p class="output"><em>...diagnostic messages</em></p>
</div>

The first portion of the script creates a JSGF Grammar for the elements we want to recognize. JSGF is an old Sun Microsystems W3C submission that was used as the basis for the W3C Voice Browser Working Group (closed on October, 2015).

This will create the vocabulary that the rest of the script will recognize.

var colors = [ 'aqua' , 'azure' , 'beige', 'bisque', 'black', 'blue', 'brown', 
'chocolate', 'coral', 'crimson', 'cyan', 'fuchsia', 'ghostwhite', 'gold', 
'goldenrod', 'gray', 'green', 'indigo', 'ivory', 'khaki', 'lavender', 'lime', 
'linen', 'magenta', 'maroon', 'moccasin', 'navy', 'olive', 'orange', 'orchid',
'peru', 'pink', 'plum', 'purple', 'red', 'salmon', 'sienna', 'silver', 'snow',
'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'white', 'yellow'];
var grammar = '#JSGF V1.0; grammar colors; public <color> = ' + colors.join(' | ') + ' ;'
// You can optionally log the grammar to console to see what it looks like
console.log('grammar');

We next setup the speech recognition engine. The three variables: SpeechRecognition, SpeechGrammarList and SpeechRecognitionEvent have two possible values, either an unprefixed version (not supported anywhere) or the webkit prefixed version (supported by Chrome and Opera). It’s always a good idea to future proof your code; I suspect that when Firefox finally supports the API it will be unprefixed.

var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
var SpeechGrammarList = SpeechGrammarList || webkitSpeechGrammarList;
var SpeechRecognitionEvent = SpeechRecognitionEvent || webkitSpeechRecognitionEvent;

The script then does assignments. First it associates variables with the speech recognition engine we set up above. Next the script configures the engine with the grammar created earlier and the attributes for the recognition engine to work. It also create placeholders for the HTML elements that will hold messages to the user and will actually change the background color.

var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;
//recognition.continuous = false;
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;

var diagnostic = document.querySelector('.output');
var bg = document.querySelector('html');
var hints = document.querySelector('.hints');

For each color that we make available to the user, we use the color as the background-color to give the user an additional cue regarding the colors he can choose from.

var colorHTML= '';
colors.forEach(function(v, i){
    console.log(v, i);
    colorHTML += '<span style="background-color:' + v + ';"> ' + v + ' </span>';
});

The script is almost ready to start. It adds a message to the .hint container and, when the user clicks anywhere on the page, it begins the recognition process.

hints.innerHTML = 'Tap/click then say a color to change the background color of the app. Try '+ colorHTML + '.';

document.body.onclick = function() {
    recognition.start();
    console.log('Ready to receive a color command.');
};

When the user speaks and the script gets a result the SpeechRecognitionResultList object contains SpeechRecognitionResult objects. It has a getter so it can be accessed like an array.

The last variable holds the SpeechRecognitionResult object at the last position.

Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual results. The last element represents the latest result the script obtained and, using the array notation we get the transcript for first result ([0]) of the latest (last) response the client has stored.

The script will allso display the result it received (sometimes it’s funny to see what the recognition engine thinks you said), change the background color to the specified color and provide a confidence level percentage, the higher the number the most likely the user will get a correct answer and the color will change.

recognition.onresult = function(event) {
    var last = event.results.length - 1;
    var color = event.results[last][0].transcript;

    diagnostic.textContent = 'Result received: ' + color + '.';
    bg.style.backgroundColor = color;
    console.log('Confidence: ' + event.results[0][0].confidence);
};

The final part of the script handles additional events. When the script detects the end of speech it stops the recognition. When there is no match it will notify the users in the diagnostic element. Finally, if there is an error, we also report it to the user.

recognition.onspeechend = function() {
    recognition.stop();
};

recognition.onnomatch = function(event) {
    diagnostic.textContent = "I didn't recognise that color.";
};

recognition.onerror = function(event) {
    diagnostic.textContent = 'Error occurred in recognition: ' + event.error;
}

Full recognition example

Speech Recognition only works in Chrome and Opera. Firefox says it supports it but it doesn’t work and returns a very criptic error message.

There is another way to work with speech recognition: dictation. Rather than work through the example that illustrates how we can use the recognition portion of the API to create a dictation application that you can then copy of email.

Code demo

All the code for this posts is available on Github

Speech Synthesis API: computer talks

Accidentally I discovered a new API that makes it easier to interact with your site/app using your voice. The Speech Synthesis API provide both ends of the computer conversation, the recognition to listen and the synthesis to speak.

Right now I’m more interested in the synthesis part and how we can include it as additional feedback on our sites and applications as an additional cue for user interaction.

Synthesis

The Speech Syntthesis API gives us a way to “speak” strings of text without having to record them. These ‘utterances’ (in API speak) can be further customized.

At the most basic the utterance is made of the following:

  • An instance of SpeechSynthesisUtterance
  • The text and language we want the voice spoken in
  • The instruction to actually speak the command using speechSynthesis.speak
var msg1 = new SpeechSynthesisUtterance();
msg1.text = "I'm sorry, Dave, I can't do that";
msg1.lang = 'en-US';

speechSynthesis.speak(msg1);

The example below changes the content and the language to es-cl (Spanish as spoken in Chile). The structure of the code is the same.

var msg2 = new SpeechSynthesisUtterance();
msg2.text = "Dicen que el tiempo guarda en las bastillas";
msg2.lang = 'es-cl';

speechSynthesis.speak(msg2);

Copy and past each example in your Dev Tools (I use Chrome’s and have tested in Chrome and Firefox) and notice how different the default voices are for each message and for each browser you test with.

We can further customize the utterance with additional parameters. The parameters are now

  • msg contains a new instance of SpeechSynthesisUtterance
  • voices contains an array of all the voices available to the user agent (browser in this case)
  • voice assigns a voice from the voices array to the instance of utterance we are working with
  • voiceURI specifies speech synthesis voice and the location of the speech synthesis service that the web application wishes to use
  • rate indicates how fast the text is spoken. 1 is the default rate supported by the speech synthesis engine or specific voice (which should correspond to a normal speaking rate). 2 is twice as fast, and 0.5 is half as fast
  • pitch specifies the speaking pitch for the utterance. It ranges between 0 and 2 inclusive, with 0 being the lowest pitch and 2 the highest pitch. 1 corresponds to the default pitch of the speech synthesis engine or specific voice

As before text holds the message we want the browser to speak, lang holds the language we want the browser to speak in and the speechSynthesis.speak command will actually make the browser speak our phrase.

var msg = new SpeechSynthesisUtterance();
var voices = window.speechSynthesis.getVoices();
// Note: some voices don't support altering params
msg.voice = voices[0]; 
msg.voiceURI = 'native';
msg.volume = 1; // 0 to 1
msg.rate = 1; // 0.1 to 10
msg.pitch = 2; //0 to 2
msg.text = 'Hello World';
msg.lang = 'en-US';

speechSynthesis.speak(msg);

Putting speech synthesis into action: why and where would we use this?

The most obvious place for me to use speech synthesis is as additional cues and messages to end-users when there is an error and problem. We’ll define three different functions to encapsulate the errors messages we want to “talk” to the user about.

For this example we’ll use the following HTML code:

<form id="form">
  <fieldset>
    <legend>Basic User Information</legend>
    <label for="username">Username</label>
    <input id="username" type="text" placeholder="User Name">
    <label for="password">Password</label>
    <input id="password" type="password" placeholder="password">
  </fieldset>
</form>

For the sake of the demo I’m only interested in the input fields and not in having a fully functional working form.

I will break the Javascript portion of the demo in two parts. The first part defines the Speech Recognition portion of the script which is very similar to the examples we’ve already discussed.

// Setup the Username Empty Error function
function speakUsernameEmptyError() {
  let msg1 = new SpeechSynthesisUtterance();

  msg1.text = "The Username field can not be empty";
  msg1.lang = 'en-US';

  speechSynthesis.speak(msg1);
}

// Setup the Password Empty Error function
function speakPasswordEmptyError() {
  let msg2 = new SpeechSynthesisUtterance();
  msg2.text = "The Password field can not be empty";
  msg2.lang = 'en-US';

  speechSynthesis.speak(msg2);
}

The second part of the script assigns blur event listeners to the input elements. Inside each event handler the code checks if the field is empty. If it is the code adds a 1px red border around it and plays the appropriate utterance we crafted earlier. If the field is not empty, either because the user entered a value before moving out of the field or at a later time, we set the border to a 1-pixel solid black color.

// Assign variables to hold the elements
let username = document.getElementById('username');
let password = document.getElementById('password');

// Add blur event listener to the username field
username.addEventListener('blur', function() {
  // If the field is empty
  if (username.value.length <= 0) {
    // Put a 1 pixel red border on the input field
    username.style.border = '1px solid red';
    // Speak the error as specified in the
    // speakUsernameEmptyError function
    speakUsernameEmptyError();
  } else {
    username.style.border = '1px solid black';
  }
});

// Add blur event listener to the password field
password.addEventListener('blur', function() {
  // If the field is empty
  if (password.value.length <= 0) {
    // Put a 1 pixel red border on the input field
    password.style.border = '1px solid red';
    // Speak the error as specified in the
    // speakPasswordEmptyError function
    speakPasswordEmptyError();
  } else {
    password.style.border = '1px solid black';
  }
})

The functions and event listeners are very basic and could stand some additional work, particularly in the validation area.

Sources