Creating a Github publishing workflow

Lord of the Files: How GitHub Tamed Free Software (And More) was an article first published in Wired in February of 2012. It was an interesting article about a company which work I value enough to pay for the service.

The article itself wasn’t what caught my attention. It was the Github repository that Wired created to go with the article. Their experience highlights a big potential of technology for the publishing process: It makes collaboration at all stages of the process easier. While researching this idea I came across this blog post

All this made me think about how Github develops their documentation and if we can use a similar process to a normal publishing development workflow.

The idea starts with public repositories but it should work the same for private repositories and Github Enterprise repositories.

This workflow expects a basic level of familiarity with Github and the following

  • All participants should understand how handle Github repositories, how to commit files (both initial commit and update) to their repositories and how to create pull requests
  • Project editors should also know how to handle pull requests and how to handle conflict merges
  • Project leaders should be Github power users or, at the very least, know where to get help

All the skill requirements listed above can be handled through training and support either in-house or from Github.


The outline below shows the workflow from both a Github and content perspectives.


Setting up the project

In the first stages we create the repository in Github and add the content to it. Ideally this initial process would be done by the same person.

Instructions for creating Github repositories are located in the Github help article:

Individual contributors fork and clone the project

For the project to be useful each individual contributor to the project has to have his/her own copy of the project where to make changes without disrupting the master copy. Different contributors may make different changes to the same file and that’s where pull requests (discussed in more details below) come in handy.

Forking and cloning are explained in more detail in this Github Help article

Edit your local copy

One of the advantages of this model is that you’re making changes to your local copy, not the master repository. If you are not happy with the changes and they are to extensive to undo manually you can always delete your working copy and clone yor repository again. In more extreme cases you can delete your fork of the project in Github and fork it again. This will get you a brand new copy of the project at the cost of loosing any changes you made since the fork was created.

Create a pull request for discussion and edit the content as needed

Pull requests allow users to tell each other about changes in the their copy of the repository against the repository the local copy was forked from.

Pull creation requests are described in this Github Help article

If you want specific people’s attention you can add their username with a @ before the user name. if the username is gandalf, you can include them in the conversation using @gandalf

Discuss the content

In addition to the initial pull request notification, the discussions in a Pull request provide notification of:

  • Comments left on the pull request itself.
  • Additional commits/edits pushed to the pull request’s branch.

The request can be left open until everyone agrees it’s completed or until the project lead closes it as complete.

Continue working on the project

Depending on the the existing workflow the Github-based authoring process can be added as an additional step or hooks can be added to Github to further process the content before final publication.

Github followed this technique by adding hooks that transfer the edited content to an external Ruby on Rails application that served the content.

If you are serving your content from gh-pages (Github’s provided project websites) you can serve the content from the master repository

Things to research further

Do we need to create branches for the proposed edits? Github and most Git good practices strongly suggest the use of topic branches but I don’t see them being absolutely necessary for publishing.

Athena: What an ofline web reading experience may look like

With the latest set of web technologies coming down the W3C/WHATWG pipeline it is now possible to create top-of-the-line responsive experiences that can also work as ofline applications.

HTML5 web is more than capable of competing with native applications. Chrome and Windows apps have shown as much capability as native apps, if we let them. What needs to happen now is the developer shift to thinking about the web in terms of application logic rather than the rules we want the web to play by.

Athena is a proof of concept for such an application. It uses ServiceWorkers for caching application resources, it uses Polymer and a suite of custom web components to handle layout and application structures.

This article discusses the rationale for Athena and how it has been implemented. It represents my ideas and opinions but it is not prescriptive; rather it embraces the Perl moto: There’s more than one way to do it (TMTOWTD). The only required part of an Athena publication is the service worker… the UI and content display is up to you.

Browser support considerations

Whatever way we choose to create and show the content must:

  • The solution must support current versions of IE, Opera, Firefox and Chrome
  • Must provide keyboard and touch alternatives for mouse navigation
  • If the content scrolls beyond the visible area on screen there mush be an icon, or another indicator, to show the text overset (maybe something like what Adobe InDesign does with overset text frames)

The work done at W3C, WHATWG and ECMA TC-39 coupled with browser vendor’s adoption have made the browser a better development environment. Libraries like jQuery were initially created to speed css interactions and to smooth the differences in CSS rendering and Javascript support among browsers. Because of this standardization and the requirements we set up above we can drop older version of browsers to concentrate in our application, not the workarounds.

This flies in the face of people telling you that we should go back as far as possible in supporting users. Not all computers can upgrade browsers or even operating systems. In most instances I would agree but if we are trying to push the envelope then we should use the best available technology without consideration for older versions that limit functionality (I’m looking at you Internet Explorer 8)

Athena’s technology stack also makes it hard to polyfill for older browsers. ShadowDOM and ServiceWorkers have limited (or non existent) polyfill support and that makes them work only with modern, evergreen browsers.

The reference implementation uses the following technologies, listed with the browser support information for each

Technology Support information
Polymer Polymer browser support
ServiceWorker Is ServiceWorker Ready?

Other technologies have a different support requirements that are outside the scope of this article.

Remember: You get what you paid for

Because the specifications used in this project (web component specifications and ServiceWorkers) are not finalized, developers can (and should) expect changes… that’s the price we pay for working with the newest stuff. But it also allows us to tighten the feedback loop to the spec writers, tell them what works, what doesn’t work and what we’d like to see going forward.

The extensible web manifesto speaks more of this way spec writers and application developers should interact with each other.


Bibliotype and and a related article with code available in Github

Hi combines elements of twitter and the open web. When you first start you are required to enter a 20 word snippet of text and to allow the site to capture your location (it adds weather data to the location for some random reason.) This is called a moment.

You are then allowed to create longer form content related to the moment you initially created. Other users in the application can ask you to do expand on the moment; whether you do so or not is your decision.

Flipboard is a windows and mobile application that collects, curates and delivers long(er) form content.

In A next-generation digital book Mike Matas presents ideas and concepts for a digital book or book-like application. These are fully interactive books that take advantage of multimedia and advanced mobile device features to make reading a more engaging experience. None of the things shown in the video is impossible using web technologies, why haven’t we done so already?

Sarah Groff-Palermo cares a lot about putting data art on the web. Books should be as much as art as technological endeavors. Her ForwardJS presentation mixes art and code in one interesting product.

Craig Mod’s essays:

John Allsopp’s A Dao of Web Design

Hosting, technology and components

What are the parts of Athena? What are they used for?

Hosted on Github

Athena publications are initially hosted on Github Pages for the following reasons:

  • Whenever you create a Github-based website you automatically get SSL.
    • ServiceWorkers will only install and work on SSL enabled websites
  • Because the website is just another branch of the repository we can set it up so that edits are pushed directly to the production publication
  • You can still assign your own domain name to the website or choose to keep the domain name
  • The basic Github functionality is free for public repositories. If you want private repositories then there’s a cost for the private repo hosting.


The core of an Athena publication is a scoped service worker that will initially handle the caching of the publication’s content. We take advantage of the multiple cache capabilitity available with service workers to create caches for individual unitts of content (like magazine issues) and to expire them within a certain time period (by removing and deleting the cache).

For publications needing to pull data from specific URLs we can special case the requests based on different pieces of the URL allowing to create different caches based on edition (assuming each edition is stored in its own directory), resource type or even the URL we are requesting.

Serviceworkers have another benefit not directly related with offline connections. They will give all access to our content a speed boost by eliminating the network roundtrip after the content is installed. If the content is in the cache, the resource’s time to load is only limited by the Hard Drive’s speed.

This is what the ServiceWorker code looks like in the demo application:

// @author Carlos Araya
// @email [email protected]
// Based on Paul Lewis' Chrome Dev Summit serviceworker. 


var CACHE_NAME = 'athena-demo';

self.oninstall = function(event) {

  event.waitUntil( + '-v' + CACHE_VERSION).then(function(cache) {

      return cache.addAll([



self.onactivate = function(event) {

  var currentCacheName = CACHE_NAME + '-v' + CACHE_VERSION;
  caches.keys().then(function(cacheNames) {
    return Promise.all( {
        if (cacheName.indexOf(CACHE_NAME) == -1) {

        if (cacheName != currentCacheName) {
          return caches.delete(cacheName);


self.onfetch = function(event) {
  var request = event.request;
  var requestURL = new URL(event.request.url);


    // Check the cache for a hit.
    caches.match(request).then(function(response) {

      // If we have a response return it.
      if (response)
        return response;

      // Otherwise fetch it, store and respond.
      return fetch(request).then(function(response) {

        var responseToCache = response.clone(); + '-v' + CACHE_VERSION).then(
          function(cache) {
            cache.put(request, responseToCache).catch(function(err) {
              // Likely we got an opaque response which the polyfill
              // can't deal with, so log out a warning.
              console.warn(requestURL + ': ' + err.message);

        return response;



As powerful as service workers are they also have some drawbacks. They can only be served through HTTPS (you cannot install a service worker in a non secure server) to prevent man-in-the-middle attacks.

There is limited support for the API (only Chrome Canary and Firefox Nightly builds behind a flag will work.) This will change as the API matures and becomes finalized in the WHATWG and/or a recommendation with the W3C.

Even in browsers that support the API the support is not complete. Chrome uses a polyfill for elements of the cache API that it does not support natively. This should be fixed in upcoming versions of Chrome and Chromium (the open source project Chrome is based on.)

We need to be careful with how much data we choose to store in the caches. From what I understand the ammount of storage given to offline applications is divided between all offline storage types: IndexedDB, Session Storage, Web Workers and ServiceWorkers and this amount is not consistent across all browsers.

Furthermore I am not aware of any way to increase this total amount or to specifically increase the storage assigned to ServiceWorkers; Jake Archibald mentions this in the offline cookbook section on cache persistence

The future

In his offline cookbook Jake Archibald presents multiple ways in which we can use ServiceWorkers. Now that we have a working prototype we can explore further uses of ServiceWorkers to enhance the offline experience. We may special case the requests to the conent directory to use a special caching strategy (maybe do cache with network fallback as described in the cookbook) to make sure that our content is as fresh as we can make it without being online all the time.

As part of the ServiceWorker family of specifications we will be able to match native applications with push notification and background content synchronization using open web APIs.

Jeff Possnick created an offline reader and posted the code and working example that provides a much better example of using Polymer and databinding. Using it as a mode we can get an experience that is much closer to existing readers with the additional capabilities provided by ServiceWorker.

JSON package file

Taking a cue from the package.opf epub package specification I’ve come up with a basic JSON definition for a publication package. We picked JSON as our package format because it is easier to write, easier to validate (using tools like jsonlint) and can easily be parsed by all existing browsers (according to

The other advantage is that we can easily customize our package file to match the needs of our specific publications.

The basic publication.json may look something like this

  "publicaton": {
    "metadata": {
      "pub-type": "book",
      "Title": "New Adventures of Old Sherlock Holmes",
      "pub-info": [
        "pub-date": "20141130",
        "pub-location": "London",
        "publisher": "That Press, Ltd"
      "authors": [
          "firstName": "Sherlock",
          "lastName": "Holmes"
      "editors": [
            "role": "Production Editor",
            "firstName": "John",
            "lastName": "Watson"
    "structure": {
      "content": [
          "title": "Introduction",
          "type": "Introduction",
          "location": "content/introduction.html"
          "title": "Chapter 1",
          "type": "chapter",
          "location": "content/chapter1.html"
          "name": "Chapter 2",
          "type": "chapter",
          "location": "content/chapter2.html"

I’ve left the format deliberately vague because I believe this needs many iterations to become the strong format that it needs to be.


The UI is one of the points where I’m struggling. Athena herself doesn’t (meaning I don’t) really care about what front end platform/Library/flavor of the week you choose for the User Interface. I’ve chosen three experimental interfaces for introducing Athena: Polymer, Angular and a plain HTML interface using Bootstrap (or maybe Foundation)

One big problem that I need to research is the routing portion of web application, whether I can route external pages through the framework and control where the content is displayed or even if I need routing altogether.
Another option would be to use some aspects of Polymer and mix them with a plain Bootstrap or Foundation site and eschew the web application side.
It’s too early in the process to decide.

The Polymer version provides a glimpse of how a Polymer-based application may look like. It also uses athena-document, a custom element that wraps a markdown transformation engine for display on the web. There shouldn’t be major problems to do the same thing with LaTeX and other document formats and there’s nothing that says we can’t use these web components in non Polymer applications.

NOTE: Right now athena-document is not embedding properly in the sidebar-layout component. Researching whether the issue is with juicy-markdown (the markdown parser element), sidebar-layout (the layout component) or with Polymer itself.
Currently all the content is displayed as HTML. I will spare you how I got the Markdown-rendered HTML to the page it’s on.

Content: format and metaphors

In my blog I’ve written about Paged Media and Generated Content for paged media and about creating a print @media style sheet. They both refer to printed content, either by creating PDF directly (using Paged media) or adapting the web content for printing (using @media rules tailored for print).

Athena doesn’t want to be a print platform but a starting point to test whether offline web apps can compete with native platforms and existing digital content standards. That said it should be possible to create paged media style sheets to at least create a good PDF for print and a high quality version for archival storage.

See Book Metaphors Online for a more thorough discussion on this subject.


I’ve discussed the role I see HTML playing in the publication process. I will only summarize the article I just linked.

HTML is a powerful language full of capabilities and, alongside CSS3 and Javascript, provides the foundation of modern sites and applications.

HTML is not an easy language to author. Depending on the variant of HTML you’re writing (XHTML or regular HTML) you have to follow different rules.

The default HTML5 is too permissive; it allows the worst tag soup markup; the same markup that has been allowed by browser vendors in an effort to be competitive. It is nice to authors but makes parsing the content much harder than it needs to be.

XHTML5 syntax (best explained in this HTML5 Doctor article by Bruce Lawson) provides stricter guidelines for authors that may turn some people off from HTML altogether. Sure, attributes must be quotes, all tags must be lowercase and all attributes must be closed, including <img> and <br> tags. The benefit is that the stricter rules make parsing content and developing new technologies around it easier.

Because of these difficulties I present 4 solutions to create content that easily transforms to XHTML5 content. I don’t go into too much detail of each solution, just enough to give you an idea of what it is.

  • Markdown is a text to (X)HTML conversion tool designed for writers. It refers both to the syntax used in the Markdown text files and the applications used to perform the conversion
  • AsciiDoc is a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs. AsciiDoc files can be translated to many formats including HTML, PDF, EPUB, man page
  • HTMLBook is an open, XHTML5-based standard for the authoring and production of both print and digital books. It is currently under development
  • Docbook, DITA and TEI are some examples of XML vocabularies that can be converted to HTML.

Athena doesn’t really care what you use to create your content as long as you provide well formed HTML5 created with XHTML5 syntax.

Book metaphors online

Does it make sense for Athena to use book metaphors?

Most of these metaphors use jQuery and jQuery plugins

For the simplest of book interfaces we can just use one of the scripts below to build a pagination setup that requires to click on either a page number or in an arrow.

If the script doesn’t incorporate it already, we can then build a keyboard navigation interface by creating a small script that matches key pressed to arrows and navigates forward or backward based on the arrows pressed.

Full examples

Turn.js and Bookblock present complete book-like interfaces. They use jQuery and, in the case of Bookblock, additional libraries that have to be cached and may present issues when working with Polymer and other web component libraries

Use cases for Athena publications

These are the three main uses cases I see for Athena publications. The first two are based on short publication looks. The third use case is based on what media and resources will serve the story best. Enhancing existing content lets us choose which part of the Athena toolkit we’ll use with the content we’re working on… at the very least convert the project into an offline capable application.

Early access content

The Early Access Publications idea is based in existing programs like Manning’s MEAP and O’Reilly’s Early release programs where the book content is published as soon as it’s ready (and sometimes as soon as the author is done writing it.)

We can do multimedia books (see below for more information about how I envision interactive books) and the multimedia work can be done in parallel to the writing or it can all be done in a collaborative fashion (Github private repo or similar version control system.)

The advantage of this kind of publication is that it tightens the feedback loop between readers, reviewers, editors and authors. It also allows for collaborative editing: whoever has access to the git repository can make changes and accept changes coming from the community (whether this repository is public or private.)

O’Reilly Media uses Ilia Grigorik’s book High Performance Browser Networking as a case study on the benefits of this tighter loop.

Serial Publications (magazines and the like)

Serials are periodical publications. Magazines are the ones that come to mind fitst but they are not the only ones. Shorter content like Atavist books and stories or the longer content available from O’Reilly Atlas with the added advantage of offline access.

This way a book is never really done. We can continue to work on stories and tell new stories as long as we want to and the stories can get that continual polish that makes for a good reading experience. If we need/want to, we can also provide CSS Paged Media Stylesheets that will allow to create a PDF version of the text/images we make available.

Interactive books

When I was thinking about interactive books there were two that came to mind: The first one was the Defiance companion iBook and Al Gore’s Our Choice as presented at TED in 2011.

Before all the new CSS, HTML5 and Javascript technologies became mainstream it was very difficult (if not right out impossible) to create create experiences like the ones above.

Now the almost impossible is merely difficult. The technologies in those books is available as open web APIs at different levels of standardization and you can create equivalent experiences from the Applications that you run in your mobile devices.

Enhancing existing content

The easiest way to start using Athena is to add the offline ServiceWorker to an existing application. This process if fairly simple:

  • Create a ServiceWorker script that cached the required files
  • Link the service worker to the main page in your application
  • Test the offline experience and overall functionality of your project

Copyright Considerations and caching stale content

When working with Athena content we have a fairly open hand as to what resources we fetch and the sources we fetch resources from. What copyright restrictions do we face when accessing and then caching content?

There is nothing that would stop me from doing this when defining the cache content:

var urlsToPrefetch = [
    // We can also fetch remote content for our cache(s)

In the links above the content originates from O’Reilly’s (Interactive Data Visualization for the Web.) In this case, the content is already available free of charge (and for which I own both the printed and ebook versions) but it illustrates a point: Unless you’re serving your content behind authentication a ServiceWorker can do whatever it wants with it.

But what happens if the external content changes? The cache will not expire untl you install a newer version of the ServiceWorker and the content will remain in the cache as long as the cache lives

It follows the “with ServiceWorker comes great responsibility” theme regarding ServiceWorkers or, as Jake Archibald puts it, “Serviceworker treats you like an adult”. The Serviceworker will allow you to do a lot of things but you’re responsible for what you do with it.

Video in ePub: Captioning, Storage and Other Thoughts

Note: While I talk primarily about ePub e-books, the same process, markup and scripts apply to a standard web page.

After finishing a draft of my fixed layout ePub I went back and researched the accessibility requirements for video on the web and how well supported they are in ePub e-books. I will present both the rationale and coding based on my ePub-based research and the article I wrote for the Web Platform Documentation project

Working with video in your ePub book presumes that you’re familiar, if not comfortable, with the process of manually creating an e-book. If you’re not then it’s better if you begin with a basic tutorial.

Reviewing video on the web

Ever since Mark Pilgrim wrote the video chapter of Dive into HTML5 the landscape has for HTML5 video has changed drastically. After a long fight Mozilla capitulated and now supports MP4 video, along with Safari, IE and Chrome. Opera is still the holdout, supporting only WebM and OGG video.

Most e-book rendering engines are WebKit based so, in theory, we should only need one version of the video but in the interest of working for multiple platforms we’ll keep at least two out of the three formats and work with them throughout the rest of the post.

What does the video look like

We define the size of the video with CSS (Optional, can also be defined in the element itself)

video {
  width: 320;
  height: 240;

We then define the video with standard HTML element, defining the formats for video in the order we do to make sure that the video will play in older versions of iOS and take into other idiosyncrasies as outline in Pilgrim’s page.

<video controls="controls" poster="video/Sintel.png">
  <source src="video/Sintel.mp4" type="video/mp4">
  <source src="video/Sintel.webm" type="video/webm">

In HTML we can write the control attribute as just control but ePub requires you to use the, somewhat sillier, controls="controls" instead.

We also add a type attribute to each source video as a hint for user agents (browsers and e-book readers) to use when deciding if they can play a given format.

Moving into ePub

Adding the video

The basic video in ePub is the same than the one we’d use in the open web. We’ll use the same video tag as our starting point with only MPEG-4 and WebM formats.

      width="320" height="240" 
  <source src="video/Sintel.mp4" type="video/mp4">
  <source src="video/Sintel.webm" type="video/webm">

This will work as written in most reading systems. Still questioning if we need WebM and if so how pervasive it is in the e-book reader world.

Declaring it in the package

Because we are making the video part of the ePub package we need to make sure that we add the components of the video to the package.opf. We do not add them to the spine content because, at the basic level, they are part of a page and not independent content. We are not covering uses of video in the spine of a document.

The items that we added to the package file are listed below:

<item id="video1-mp4"         href="video/Sintel.mp4"    media-type="video/mp4"/>
<item id="video1-webm"        href="video/Sintel.webm"   media-type="video/webm"/>
<item id="video1-cover"       href="video/Sintel.png"    media-type="image/png"/>

As we will discuss later in the essay, one of the first questions that you need to consider is whether to package the video with the book or host it externally. For the purpose of this essay we’ll package the video with the book and discuss some alternatives in the section Video considerations for e-books.

Scripting user interaction

The first part of the JavaScript code is a generic set of functions that do three things:

  • checkReadingSystemSupport test whether a reader supports the features we’ll need for the video to work. The script does this by looping through the values in the neededFeatures variable and if the reader supports the feature then it continues, otherwise it returns false.
  • togglePlay checks if the video ended or if it has paused. If we meet either of these conditions we play or resume video playback; otherwise we pause the video
  • toggleControls checks if the default controls are visible. If they are then the script hides them, otherwise the scripts shows them
* Shared functions used in all pages that use video
function checkReadingSystemSupport() {
  var neededFeatures =["mouse-events", "dom-manipulation"];
  var support = typeof navigator.ePubReadingSystem !== 'undefined';
  if (support) {
    for (var i = 0; i < neededFeatures.length; i++) {
      if (!navigator.ePubReadingSystem.hasFeature(neededFeatures[i])) {
          return false;
  return support;

function togglePlay() {
  var video = document.getElementsByTagName('video')[0];
  if (video.ended || video.paused) {;
  } else {

function toggleControls() {
  var video = document.getElementsByTagName('video')[0];
  if (video.controls) {
    video.removeAttribute('controls', 0);
  } else {
    video.controls = 'controls';

By themselves the functions are good but don’t do much for playing the video. This is where the second script comes in. Using the generic functions in the first script, the functions in the second script will take user input, click/tap or double click/double tap, and perform an action based on the input.

* touch and keyboard based functions

window.onload = function() { // equivalent to jQuery's $(document).ready
  var video = document.getElementsByTagName('video')[0];

  if(checkReadingSystemSupport()) {
     video.removeAttribute('controls', 0);;

  video.addEventListener('click', function(e){
  }, false);

  video.addEventListener('dblclick', function(e){
  }, false);    

  video.addEventListener('keyup', function (e) {
    var k = e ? e.which : window.event.keyCode;
    if (k == 32) {


Before moving forward make sure that the video(s) and the poster image are in the directory specified in the page and that it matches the directory you used in the package (this issue tricked me the first time I added video to an e-book)

Package the files as you normally would, test that the video works and validate the book with epubcheck. This is the fist stage.

Video Considerations for e-books

As mentioned earlier, one of the first things to consider is whether to package the video with the e-book or host it remotely. They both have advantages and disadvantages.

Hosting videos remotely means that your users have to be online to play the video. As far as I know there is no way to cache the video and then play it back when the reader is offline.

Adding the video to the book increases the size of the book file and, with the more videos in the book, can come close to or get over the size limit of an ePub e-book. Some vendors (like Amazon) charge for the download based on the size of the file being downloaded.

Example book

The book cc-shared-culture presents multiple ways to add video to your e-book. I’ve chosen to allow both the default controls as well as a click/tap interface. My concern is always that the user knows how to play the video.

I’ve also created a shorter book with a video from theDurian Blender open movie project. It’s hosted in Github along with the book that uses the code and techniques discussed here

Creating captions

Defining captions.

  1. a title or explanation for a picture or illustration, especially in a magazine.
  2. a heading or title, as of a chapter, article, or page.
  3. Movies, Television. the title of a scene, the text of a speech, etc., superimposed on the film and projected on the screen.


What’s the difference between captions and subtitles

Although captions and subtitles are similar in the way we create them and add them to videos, they are different in purpose.

Captions serve primarily as an accessibility device that allows people with deaf or who are hard of hearing to fully access the video. Captions also help in situations where the video has no audio, the owner of the video muted it or provided no audio or the environment is too loud for people to listen to the audio.

Subtitles provide translation of the audio and, sometimes, other audio clues to other languages. A kind of subtitles, SDH (Subtitles for the Deaf and Hard of hearing), provides context for the subtitled audio.

Types of captions

As far as Web video captions are concerned there are two types of captions.

TTML (Timed Text Markup Language) is an XML-based captioning system. It is a World Wide Web Consortium recommendation. Internet Explorer is the only browser that supports the technology.

WebVTT (Web Video Text Tracks) is a community-led caption format (it is not a W3C draft or recommendation). It’s similar in structure to SRT captions and there was an earlier proposal called WebSRT. All modern browsers support this type of captions.

VTT captions in detail

We will concentrate in the captioning aspect of the VTT “spec” and will not address other aspects of VTT such as metadata, karaoke styling and other. If you want to read the specification or this HTML5 Doctor article on video subtitling.

At its simplest a VTT file is a text file formatted as shown below (and used with the Sintel video in the book)


00:00:12.000 --> 00:00:15.000 A:middle T:10%
<v.gatekeeper>What brings you to the land
of the gatekeepers?

00:00:18.500 --> 00:00:20.500 A:middle T:80%
<v.sintel>I'm searching for someone.

00:00:36.500 --> 00:00:39.000 A:middle T:10%
<v.gatekeeper>A dangerous quest for a lone hunter.

00:00:41.500 --> 00:00:44.000 A:middle T:80%
<v.sintel>I've been alone for as long as I can remember.  

The hardest part of creating the captions is the timing. It requires hundredth of a second timing and we need to write all digits (even if they are 0) for the VTT cue to display and validate.

controlling positioning of the cue

In addition to adding the timed text we can control the placement of the caption inside the video element

According to HTML5 Doctor, we can use the following positioning attributes

D:vertical / D:vertical-lr

Display the text vertically rather than horizontally. This also specifies whether the text grows to the left (vertical) or to the right (vertical-lr).

L:X / L:X%

Either a number or a percentage. If a percentage, then it is the position from the top of the frame. If a number, this represents what line number it will be.

The position of the text horizontally on the video. T:100% would place the text on the right side of the video.

A:start / A:middle / A:end

The alignment of the text within its box – start is left-aligned, middle is centre-aligned, and end is right-aligned. This syntax is similar to how SVG handles alignment


The width of the text box as a percentage of the video width.

Some examples of the styles above:

00:00:01.000 --> 00:00:10.000 A:middle T:50%
00:00:01.000 --> 00:00:10.000 A:end D:vertical
00:00:01.000 --> 00:00:10.000 A:start T:100% L:0%

Built-in styles

Bold text: <b>Lorem ipsum</b>

Italic text: <i>dolor sit amet</i>

Underlined text: <u>consectetuer adipiscing</u>

Ruby text: <ruby>見<rt>み</rt></ruby>

Additional styles

You can apply a CSS class to a section of text using <c.myClass>Lorem ipsum</c>, giving us many more styling options.

You can also add a voice indicator to your cue using something like <code><v Tom>Hello world</v>. This declaration accomplishes three things:

  • The caption will display the voice (Tom) in addition to the caption text.
  • A screen reader can read the name of the voice, possibly event using a different voice for male or female names.
  • It offers a hook for styling so that all captions for Tom could be in blue.

Putting it all together

Now that we’ve built the video tag and we’ve taken a look at how to build the VTT caption track we’re ready to put them together. If we’re working with a single language caption file the result will look like this:

      width="320" height="240" 
  <source src="video/Sintel.mp4" type="video/mp4">
  <source src="video/Sintel.webm" type="video/webm">
  <track src="sampleCaptions.vtt" kind="captions" srclang="en">

The code above is enough to add English captions to the video and have them play using the user agent (browser or reader) native ability.

Furthermore, we can specify multiple caption and subtitles tracks that will allow the user to select which language to view the captions in. The code allowing the user to choose between English captions, German and French subtitles looks like this:

      width="320" height="240" 
  <source src="video/Sintel.mp4" type="video/mp4">
  <source src="video/Sintel.webm" type="video/webm">
  <track src="Sintel-en.vtt" kind="captions" srclang="en">
  <track src="Sintel-de.vtt" kind="subtitles" srclang="de">
  <track src="Sintel-fr.vtt" kind="subtitles" srclang="fr">


Prepare your book as you normally would. The testing now requires to test the captions; whether you can show them and whether you can switch th

Additional links and resources

The Trap of CDNs

The problem

There is a tricky issue when working with CDN during development. CDN requires an active Internet connection to actually load the script referenced as the source. If you are not online then jQuery will not load the first time you access the page or application and all other scripts will fail as they depend on jQuery (which couldn’t load from the CDN and had not local backup)

I first came across this issue when building a site that used carousels and jQuery based animations. I started working on the project while on the train and using the standard Google CDN load mechanism. None of the scripts in the page worked. It wasn’t until I saw the following snippet in the HTML5 Boilerplate that made things easier to work with.

A solution

The trick below uses jQuery but it also applies to any JavaScript library loaded through CDN.

[code lang=html]
<script src="//"></script>
<script>window.jQuery ||
document.write('<script src="js/vendor/jquery-1.11.1.min.js"></script>')</script>

We first load jQuery 1.11.1 from the Google CDN as we normally would.

Right after we load it from CDN we test if the jQuery object exists and, using a logical or statement (||). If it exists we use that and if it doesn’t then we load a local version of jQuery using document.write to inject the script tag into the document.

Pros and Cons

If you’re not careful this system defeats the idea of having CDNs . You end up with multiple copies of jQuery or other libraries spread throughout your projects that your browser, most likely will not cache. This will adversely affect performance.

As I don’t expect this situation to happen very often. If Google’s CDN goes down there are more serious issues to worry about than my app not working; still this is a good workaround to prevent my content not displaying properly just because of a CDN.

ePub package.opf generator

This is the first pass at a script to generate a basic package.opf file for epub3 ebooks using Python 2.X.

What it does

When you run it from the root of your ebook, the script will create a package.opf file, populate it with basic metadata as required by the epub3 specification, it will also create metadata and spine sections based on the content of the OEBPS directory.

How does the script do it

The script only uses modules from the default library. I want the a portable script and I don’t want to worry whether a module is compatible with 2.X and 3.X, compatible with either version or if uses a different syntax on each version.

At the top of the script we use the environment ‘shebang’ to declare the location of the Python executable without hard coding it.

We import the following modules, each for a specific purpose:

  • mimetypes to find the mime type of our files automatically
  • glob to create the list of files under OEBPS
  • os and os.path to create the items we populate our package file with

After loading the modules the first thing we do is initialize our mime-type database. This will make sure, as much as possible, that we match the file with the correct mime-type.

We then open our package.opf file in write mode.

The last step in this stage is to create the glob expression that will tell the rest of the scripts what files to work with.

We are now ready to create the content we’ll write to the file.

#!/usr/bin/env python 

import mimetypes
import glob
import os
import os.path

# Initialize the mimetypes database
# Create the package.opf file
package = open('package.opf', 'w')

# WARNING: This glob will add all files and directories 
# to the variable. You will have to edit the file and remove
# empty directories and the package.opf file reference from
# both the manifest and the spine
package_content = glob.glob('OEBPS/**/*')

The second stage is to create the templates for the XML portions of the package. There are two things to notice with this part.

  • The XML elements are empty. I only create attributes as necessary
  • I create static templates and don’t use dynamic content because all the modules I found had issues when working with namespaces.

The three templates will be used when building the file.

template_top = '''<package xmlns=""
  version="3.0" xml:lang="en">
  <metadata >
    <!-- TITLE -->
    <meta property="dcterms:modified"></meta>
    <dc:identifier id="book-id"></dc:identifier>
    <meta name="cover" content="img-cov" />

template_transition = '''</manifest>
  <spine toc="ncx">'''

template_bottom = '''</spine>

The enumeration builds the dynamic section of the file. We first create two variables to hold the content and spine of the manifest.

For each element of our package_content (the content of the OEBPS directory) we do the following:

  • Set the basename variable to the part of the current item
  • Get the mime type for the item
  • Add the item XML tag to the manifest assigning it an ID, the base path and the mime type
  • Add the item to the spine by creating the idref element with an ID matching the one we used for the item tag above

When we complete this section, we have a list of all the files under OEBPS and are now ready to, finally, build the package file.

manifest = ""
spine = ""

for i, item in enumerate(package_content):
  basename = os.path.basename(item)
  mime = mimetypes.guess_type(item, strict=True)
  manifest += 't<item id="file_%s" href="%s" media-type="%s"/>n' % (i+1, basename, mime[0])
  spine += 'nt<itemref idref="file_%s" />' % (i+1)

After all the work, actually creating the file is almost anti climatic. We print each section in the following order:

  • template_top
  • manifest
  • template_transition
  • spine
  • template_bottom
# I don't remember my python all that well to remember 
# how to print the interpolated content. 
# This should do for now.

An example of the complete file looks like this:

<package xmlns=""
  <metadata >
    <!-- TITLE -->
    <meta property="dcterms:modified"></meta>
    <dc:identifier id="book-id"></dc:identifier>
    <meta name="cover" content="img-cov" />
    <item id="file_1" href="styles.css" media-type="text/css"/>
    <item id="file_2" href="type" media-type="None"/>
    <item id="file_3" href="book_cover.jpg" media-type="image/jpeg"/>

  <spine toc="ncx">
    <itemref idref="file_1" />
    <itemref idref="file_2" />
    <itemref idref="file_3" />

Thing to remember

This is not a complete solution. It is a starting point and it will require manual edits before it passes validation. It is still better than starting from scratch, at least in my opinion.

Things to work on

The first thing I need to figure out is how to skip or remove empty folders. In the example above the media folder needs to be removed manually before the package file will pass epubcheck validation.

Another thing I’ll have to research is whether the glob expression takes all the files we need. For geeks, how many levels deep does the glob expression go?