Digital storytelling

Building the holodeck

I’ve been in and out virtual worlds since I was an undergraduate in the mid ’90s and man how have things changed between now and then. This series of essays outline some of the historical context for some ideas for augmented additions to the real world and fully immersive virtual environment.

In 1987 we were introduced to Star Trek: The Next Generation and the multiple new technologies it introduced, the holodeck chief among them.

A Holographic Environment Simulator, or holodeck for short, was a form of holotechnology designed and used by Starfleet. They were installed aboard starships, space stations, and at Starfleet institutions for entertainment, training, and investigative purposes. A typical holodeck consisted of a room equipped with a hologrid containing omnidirectional holographic diodes, enabling holographic projections and creation of holodeck matter through the manipulation of photons contained within force fields.

The most obvious function of a holodeck is to provide entertainment and diversion for the crew. (TNG: “11001001”, “The Big Goodbye”, “Elementary, Dear Data”, “A Fistful of Datas”; VOY: “The Cloud”, “Homestead”)

A holodeck can be used to create training simulations and exercise environments not otherwise available or safe, including star ship battle simulations and the Bridge Officer’s Test. (TNG: “Code of Honor”, “Where Silence Has Lease”, “The Emissary”, “New Ground”, “Firstborn”; VOY: “Learning Curve”, “Extreme Risk”, “The Fight”)

The holodeck can be used as a laboratory to aid in analysis, such as recreating the scene of a crime or accident to aid in forensic investigations. (TNG: “A Matter of Perspective”) They can be used to visualize a 3D scene from alternate data sources for analysis (TNG: “Identity Crisis”, “Phantasms”; VOY: “Distant Origin”) or used as a brainstorming tool. (TNG: “Schisms”, “Booby Trap”)

Memory Alpha

Holodecks can create both open ended worlds (like the worlds Jake Sisko ans his father shared in DS9’s Emissary) and specific scenarios (like the Dixon Hill Series of pulp novels that Captain Picard likes in TNG’s “The Big Goodbye”, “Manhunt”, “Clues”) or more open ended scenarios like the opening of Deep Spance Nine’s Emmisary or the Victorian novels of Voyager’s captain Janeway.

While we don’t have the holodeck quite like the one in the series but in the last 4 years or so virtual technologies have matured tto the point where an early version of the holodeck may be possible… nothing like Star Trek or the Full Dive Gear from Sword Art Online but immersive enough to trick our brains into thinking we’re already there.

In these essays we’ll explore Beacons as storytelling devices, Augmented Reality gear to tell stories and fully immersive environments and the stories we can tell in and with these 3D virtual worlds.

Digital worlds then and now

The progress to a fully 3D immersive holodeck is not a new endeavor or something that has appeared in a single discipline. In order to better understand what we want to do in the VR world it pays to get an idea of where we’ve already been. I’ll break the old virtual worlds into three categories:

  • Text-based virtual environments
  • SIMNET and military simulations
  • Early attempts at virtual reality: Lucasfilms Habitat

Text-based virtual environments

The text-based environments I refer to the in this section are the TinyMUD created by James Aspnes in 1989 as the first in a line of successors to the original MUD created by Roy Trubshaw and Richard Bartle in 1978.

What differentiated TinyMUD and its children was a move away from combat as emphasized in the original MUD into a more social and communication environment. It also allowed al players to create objects and spaces in the world and then link it to existing content, whether player created or not.

I still remember my time as an anthropology undergrad (1994 – 98) and how I became involved in the early forms of immersive storytelling environments. MUDs (Multi User Dungeons) and their derivatives provided free-form interactions in a fully immersive textual experience.

For me it was Lord of The Rings and Battlestar Galactica. These style of worlds known as MUSH (Multi User Shared Hallucination) provides for a fluid style of writing that describes both the users’ speech and their interactions.

The example below, taken from Elendor MUSH illustrates the type of interaction players write.

Below is a snippet from an RP scene in Gondor recently played.
The two players involved are Tathar and Turlach Nimothan, aunt and nephew.

There is not much to be seen here, for the weather is dark and rainy and
the bookshelves unlit. But at a table is set a candle, its wick nearly
drowned in melted wax, and beside it, Turlach’s dark head rests upon an
open book. His eyes are closed.

The door swings open quietly, and the wavering light of another candle
shedding its soft golden glow into the room. Tathar comes in, stopping
as she sees Turlach, then moving to the bookshelves, where she lights
a lamp, and pulls out one of the books, taking it and her candle to a chair nearby.

[fast-forward to a bit of their conversation.]

“It means …” Turlach looks up cautiously, eyeing the slightly-ajar
door. “She may have been taken by the Black Company. But why, I wonder?”

The door is pushed further open as an elderly butler comes in. “Yes,
m’lady?”

“Something hot, please,” Tathar instructs him. “Turlach and I are growing
chilled, doing nothing but sitting and reading. A hot drink would be very
welcome.” Beldin nods and vanishes again, and Tathar turns her attention
back to her nephew. She is frowning.

“Taken…? But of what avail would that be to them? The Ephalkhir have nothing
to do with Mormegil or its estates.”

The idea that attracted a large number of people (myself included) to MUDs was the chance to recreate the worlds of our favorite novels or TV series and push them beyond their conclusion. It wasn’t enough to know how the last episode of the single season of Battlestar Galactica ended, we wanted to build the story of what happened afterwards. Did the Galactica find Earth? How did it happen?

MUD creators provided us with very detailed worlds built on text and the possibilities to interact with the environment and with objects created by the system operators or by other players. We recorded the stories we played in to have a record of our collective performances for posterity.

The experiences were not always positive. Some of the first conversations about online communities, punishment and enforcement originated from less than pleasant, some may say terrifying, events that happened in these early text-baed communities.

Wherever you see an MMORPG you’re seeing the children of MUD and when looking at AR and VR Stories it is no different. We are all children of MUD.

Distributed Training Simulations: SIMNET

Simnet Simulators

SIMNET was the first successful implementation of large-scale, real-time, man-in-the-loop simulator networking for team training and mission rehearsal in military operations.

— SIMNET: The Advent of Simulator Networking

SIMNET stands for SIMulator NETworking. Initiated in 1983, it was the first “shared virtual
reality” distributed simulation system, which continues to have significant influences. It was
sponsored by DARPA, the Defense Advanced Research Projects Agency, the Department of
Defense’s principal high-risk, high-payoff research and development organization, established in 1958. […]

— SIMNET and Beyond: A History of the Development of Distributed Simulation

SIMNET was created in the early 1990s to provide training simulations for large scale forces without the cost associated with moving such large forces to training sites like The National Training Center in Fort Irwin California.

SIMNET Simulators

There were two revolutionary aspects of SIMNET: The real time long distance networking infrastructure and the simulator pods built to participate in the simulations.

SIMNET Networking

SIMNET provided a fully decentralized architecture for each of the simulator pods representing anything from an M1 Abrahams tank to an AH-64 Apache attack helicopter.

The biggest improvement was the lack of a central coordinating unit. Each simulator pod was responsible for tracking its status, the status of any unit it interacted with directly and any result of interaction (did it hit another object such as a mine, did a gun fired hit something or was the unit hit by an object fired by another unit) and of reporting its state to units it interacts with in the network. This was defined as each unit being an “autonomous node”.

This made SIMNET scalable to several hundred units spread around military bases in the United States (Fort Benning, GA, Fort Rucker, AL, Fort Knox, KY, and Fort Leavenworth, KS) and Grafenwoehr, Germany. To accommodate for the increased bandwidth requirements and potential latency, SIMNET adopted two additional principles:

First, a simulation node would send updates only when its state changed. For example, a stationary entity would not send state updates (except for minimum “heart-beat” updates that assured other nodes that network communications had not been lost).

Second, an entity moving in a straight line at a steady speed would send state updates only when it changed its speed or direction.

SIMNET Simulator Pods

The simulator pods each represented a fully crewed vehicle (a 4-man tank or a 2-man helicopter). It would have been very difficult to fully reproduce a combat vehicle in the simulation environment SIMNET adoped a “selective fidelity” (also known as “the 60% solution”) where only parts relevant to the functioning of the vehicle in the simulation were kept. Commanders and other mission supervisors could decide what systems and aspects of the 60% were carried over. An example of this was the decision to induce “jiggle” in the vision block views and a “rumble” in the crew’s seats to give them clues as to how fast they were moving and what kind of terrain they were traversing.

Some of the vehicle simulator pods (taken from SIMNET and Beyond):

M1 Abrams Main Battle Tank. The M1 has been the Army’s primary heavy combat vehicle since entering service in 1980. It carries a crew of four: the driver, who reclines in a very tight space inside the forward hull compartment, plus the tank commander, the loader, and the gunner, who sit inside the turret compartment. The driver’s orientation is fixed, but the turret can rotate 360 degrees (quite rapidly), so the turret’s orientation can be independent of which direction the tank is moving. The tank commander’s role is to control and coordinate the tank’s overall operation, as well as to coordinate tactically with other commanders. He tells the driver where to go, designates targets for the gunner, and tells the loader what type of ammunition to pull from the rack behind the blastproof doors and load into the breech of the 105 mm or 120 mm cannon. The crew communicates with each other over a local intercom, and the commander communicates over a multi-channel radio.

Similar to the real vehicle, the M1 simulator has two separate compartments: one for the driver and one for the other three crew members, each with the most essential controls and displays (chosen in accordance with the “selective fidelity” principles previously noted). The commander sits above the gunner, with the loader located to their left between the ammunition rack and the gun breech. A full description of the simulator crew compartments and controls from the SIMNET era can be found in (Chung, 1988). The driver has three “vision block” displays, the gunner has a stabilized gunsight with high/low magnification, the commander has three vision blocks plus an auxiliary sight that duplicates the gunner’s sight, and the loader has a periscope display. Both the commander’s and loader’s cupolas can rotate to simulate additional vision blocks in a closed-hatch M1.

Rotary-Wing Aircraft (RWA). Generic attack/scout helicopter, approximating the AH-64 or OH-58. Includes pilot and copilot/gunner stations with flight controls and displays. Includes a slewable electrical-optical sensor for use in target acquisition and engagement. It is armed with HELLFIRE laser-guided anti-tank missiles and air-to-air STINGER missiles, in addition to a 30-mm chain gun. The pilot and co-pilot share visual access to eight out-the-window displays, arranged in a “three over five” configuration, plus a single sensor channel, switchable between a thermal image and a daylight TV image. The RWA uses typical helicopter cyclic and collective controls, along with a throttle, for flight maneuvers. (Harrison, 1992)

SIMNET Impact and Legacy

The simulators were not a marvel of high technology but they did their job well enough that SIMNET evolved from a training system into mission planning and, eventually, technology development simulator. The technologies pioneered in SIMNET are now part of a family of Distributed Interactive Simulation standards that have grown past their origins in wargaming and war fighting simulations; space exploration and medical fields have also embraced simulation as a lower cost, lower risk alternative to live activities.

The events are reproducible. The Battle of 73 Easting was fully recreated in collaboration with the member of the unit involved and a very thorough analysis. They captured GPS coordinates from the American vehicles involved in the battle and reconstructed the event as accurately as possible. People can now review the simulated battle and see the action as it happened more than 20 years ago. Simnet became a study (and marketing) tool for the Army and the basis for a new family of training environments, the latest of which is the Synthetic Training Environment.

Several members of the original teams that worked in SIMNET moved on to gaming and distributed networking endeavors after SIMNET was over. Some of these startups include:

  • RTIME, Inc. (Rolland Waters) game network engines
  • MetaVR, Inc (W. Garth Smith), simulation and training, GIS systems
  • MaK Technologies (Warren Katz and John Morrison), simulation software
  • Reality by Design, Inc (Joanne West Metzger and Paul Metzger), simulation and training systems
  • Zipper Interactive (Brian Soderberg), game developer (SOCOM game series)
  • Wiz!Bang (Drew Johnston), game developer

Early attempts at graphic VR: Habitat

Lucasfilm’s Habitat

One of the first attempts at creating a graphical online virtual world was Habitat. Created by F. Randall Farmer and Chip Morningstar for Quantum Link an early Internet Service Provider targeted to Commodore 64 and Commodore 128 systems using dial up connections.

An Habitat scene.

The lead creators of Habitat, Chip Morningstar and F. Randall Farmer have written extensively about what Habitat is and what lessons they have learned in the process.

The essential lesson that we have abstracted from our experiences with Habitat is that a cyberspace is defined more by the interactions among the actors within it than by the technology with which it is implemented. While we find much of the work presently being done on elaborate interface technologies — DataGloves, head-mounted displays, special-purpose rendering engines, and so on — both exciting and promising, the almost mystical euphoria that currently seems to surround all this hardware is, in our opinion, both excessive and somewhat misplaced. We can’t help having a nagging sense that it’s all a bit of a distraction from the really pressing issues. At the core of our vision is the idea that cyberspace is necessarily a multiple-participant environment. It seems to us that the things that are important to the inhabitants of such an environment are the capabilities available to them, the characteristics of the other people they encounter there, and the ways these various participants can affect one another. Beyond a foundation set of communications capabilities, the details of the technology used to present this environment to its participants, while sexy and interesting, are of relatively peripheral concern.

The Lessons from Lucasfilms Habitat

It is interesting to see the parallels between Habitat and its modern descendants like Destiny and World of Warcraft. The characters and the backgrounds and the interactions are different but how you move and how you interact with people hasn’t really changed significantly since the early days of online multi player games.

Some things the authors warn us that central planning is impossible. In a fully interactive multi player environment makes it impossible to plan every little contingency. We’ll discuss this when we talk about storytelling and planning versus spontaneity.

The emphasis has always been in community building. Habitat was immersive not because of its technology, the graphics don’t hold a candle to today’s massive 2D and 3D game environments, textures and animations, but because of the community and the interactions between people who were in different geographical locations yet were able to live and interact together in a community that was free of trolls and free of the harassment that permeates our modern online communities.

They also argue for a world that is consistent with itself. If something happens handle in the world. Don’t step out of the world to handle it. The two examples they give very strongly illustrate the point.

In the first instance it would have been easy to just take the money away from the ‘cheaters’ and restore the economy to what it should be. They were lucky that the avatars were good and used their resources for the betterment of the community.

In order to make this automated economy a little more interesting, each Vendroid had its own prices for the items in it. This was so that we could have local price variation (i.e., a widget would cost a little less if you bought it at Jack’s Place instead of The Emporium). It turned out that in two Vendroids across town from each other were two items for sale whose prices we had inadvertently set lower than what a Pawn Machine would buy them back for: Dolls (for sale at 75T, hock for 100T) and Crystal Balls (for sale at 18,000T, hock at 30,000T!). Naturally, a couple of people discovered this. One night they took all their money, walked to the Doll Vendroid, bought as many Dolls as they could, then took them across town and pawned them. By shuttling back and forth between the Doll Vendroid and the Pawn Shop for hours, they amassed sufficient funds to buy a Crystal Ball , whereupon they continued the process with Crystal Balls and a couple orders of magnitude higher cash flow. The final result was at least three Avatars with hundreds of thousands of Tokens each. We only discovered this the next morning when our daily database status report said that the money supply had quintupled overnight.

We assumed that the precipitous increase in “T1” was due to some sort of bug in the software. We were puzzled that no bug report had been submitted. By poking around a bit we discovered that a few people had suddenly acquired enormous bank balances. We sent Habitat mail to the two richest, inquiring as to where they had gotten all that money overnight. Their reply was, “We got it fair and square! And we’re not going to tell you how!” After much abject pleading on our part they eventually did tell us, and we fixed the erroneous pricing. Fortunately, the whole scam turned out well, as the nouveau riche Avatars used their bulging bankrolls to underwrite a series of treasure hunt games which they conducted on their own initiative, much to the enjoyment of many other players on the system.

The second example is of how to handle incidents “in world” as opposed to handling it as a technical exercise and frustrate users when we violate the contract between the users and the fantasy in their world.

One of the more popular events in Habitat took place late in the test, the brainchild of one of the more active players who had recently become a QuantumLink employee. It was called the “Dungeon of Death”. For weeks, ads appeared in Habitat’s newspaper, The Rant, announcing that that Duo of Dread, DEATH and THE SHADOW, were challenging all comers to enter their lair. Soon, on the outskirts of town, the entrance to a dungeon appeared. Out front was a sign reading, “Danger! Enter at your own risk!” Two system operators were logged in as DEATH and THE SHADOW, armed with specially concocted guns that could kill in one shot, rather than the usual twelve. These two characters roamed the dungeon blasting away at anyone they encountered. They were also equipped with special magic wands that cured any damage done to them by other Avatars, so that they wouldn’t themselves be killed. To make things worse, the place was littered with cul-de-sacs, pathological connections between regions, and various other nasty and usually fatal features. It was clear that any explorer had better be prepared to “die” several times before mastering the dungeon. The rewards were pretty good: 1000 Tokens minimum and access to a special Vendroid that sold magic teleportation wands. Furthermore, given clear notice, players took the precaution of emptying their pockets before entering, so that the actual cost of getting “killed” was minimal.

One evening, one of us was given the chance to play the role of DEATH. When we logged in, we found him in one of the dead ends with four other Avatars who were trapped there. We started shooting, as did they. However, the last operator to run DEATH had not bothered to use his special wand to heal any accumulated damage, so the character of DEATH was suddenly and unexpectedly “killed” in the encounter. As we mentioned earlier, when an Avatar is killed, any object in his hands is dropped on the ground. In this case, said object was the special kill-in-one- shot gun, which was immediately picked up by one of the regular players who then made off with it. This gun was not something that regular players were supposed to have. What should we do?

It turned out that this was not the first time this had happened. During the previous night’s mayhem the special gun was similarly absconded with. In this case, the person playing DEATH was one of the regular system operators, who, accustomed to operating the regular Q-Link service, had simply ordered the player to give the gun back. The player considered that he had obtained the weapon as part of the normal course of the game and balked at this, whereupon the operator threatened to cancel the player’s account and kick him off the system if he did not comply. The player gave the gun back, but was quite upset about the whole affair, as were many of his friends and associates on the system. Their world model had been painfully violated.

When it happened to us, we played the whole incident within the role of DEATH. We sent a message to the Avatar who had the gun, threatening to come and kill her if she didn’t give it back. She replied that all she had to do was stay in town and DEATH couldn’t touch her (which was true, if we stayed within the system). OK, we figured, she’s smart. We negotiated a deal whereby DEATH would ransom the gun for 10,000 Tokens. An elaborate arrangement was made to meet in the center of town to make the exchange, with a neutral third Avatar acting as an intermediary to ensure that neither party cheated. Of course, word got around and by the time of the exchange there were numerous spectators. We played the role of DEATH to the hilt, with lots of hokey melodramatic shtick. The event was a sensation. It was written up in the newspaper the next morning and was the talk of the town for days. The Avatar involved was left with a wonderful story about having cheated DEATH, we got the gun back, and everybody went away happy.

These two very different responses to an ordinary operational problem illustrate our point. Operating within the participants’ world model produced a very satisfactory result. On the other hand, taking what seemed like the expedient course, which involved violating the world-model, provoked upset and dismay. Working within the system was clearly the preferred course in this case.

Some of these experiences continue to shape our technological communities, how to balance freedom and creativity with keeping people safe. Others have to do with community rules, user types and managing behavior and expectation.

Single user stories, like those I’m most interested in crafting, don’t have these types of problems but, as soon as you bring more than two people together and one or more are women, online you have to deal with the potential for attacks, trolling and sexist remarks.

Pandoc, Multiformat Publishing

Pandoc is a converter to and from different text formats. What attracted me to the tool is that it allows me to work with Microsoft Word documents and convert them to Markdown or any other formats.

Figure 1 shows the formts that Pandoc converts to and from.

Pandoc Input and Output Format Visualization
Pandoc Input and Output Format Visualization

Since I already work with Markdown this is a valued added tool as it allows me to convert Markdown to formats that would be very difficult or impossible to do without a tool.

We’ll explore converting Markdown to epub3, an ebook standard and the starting point for Kindle conversion using Kindlegen, convert the same Markdown document to Latex and then explore an alternative to my CSS for Paged Media and PrinceXML way to create PDF documents.

Are these solutions perfect? No, definitely not. They are good starting points for future work.

  • Using Pandoc to create epub books saves me from having to do the grunt work of manually creating the XML files required for epub.
  • The Latex conversion gives me a working latex file that I can then further customize by adding packages and additional environments.
  • The PDF conversion is a backup in case PrinceXML changes its current practice of not charging for development, only for production work

Markdown to epub

Epub, and more specifically, epub 3, is an ebook formatcreated by IDPF and now being submitted to the W3C as paart of the merger of the two institutions.

The format itself is a zipped file with a application/epub+zip mimetype. The contenxt of an example ebook are show in the following listing. We’ll disect it below

peter-pan-sourcecode-sans
├── META-INF
│   ├── com.apple.ibooks.display-options.xml
│   └── container.xml
├── OEBPS
│   ├── ch01.xhtml
│   ├── ch02.xhtml
│   ├── cover.xhtml
│   ├── css
│   ├── images
│   ├── notes.xhtml
│   ├── package.opf
│   ├── toc.ncx
│   └── toc.xhtml
└── mimetype

The META-INF directory

The META-INF directory contains information about the book.

The iBooks propietary com.apple.ibooks.display-options.xml tells iBooks about special characteristics for one or more versions of the application (macOS, iPad or iPhone).

In this case we tell it that for all platforms we want to use custom fonts and we don’t want to make this an interactive book. The code to do this is this:

<display_options>
  <!-- all devices -->
  <platform name="*">
    <!-- set to "true" when embedding fonts -->
    <option name="specified-fonts">true</option>
    <!-- set to "true" when using javascript or canvas -->
    <option name="interactive">false</option>
  </platform>
</display_options>

In container.xml we tell the epub reader where to find the root of the book. This is the package.opf file, not an index.html or similar file. In our example content, the file looks like this and it points to the package.opf file inside the OEBPS directory (discussed in the next section):

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/package.opf"
      media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

If you’re not targeting iBooks with your file you can remove com.apple.ibooks.display-options.xmlfrom the META-INF directory but then iBooks will always use system fonts, even if you’ve packaged the fonts in your system.

The OEBPS directory

The OEBPS directory contains the actual book content plus a few XML files used to describe and define the structure to the book.

It’s important to note that the contenxt is written in XHTML, either 1.1 or the XHTML version of HTML5. This poses additional restrictions and requirements.

  1. All XHTML documents must have specific elements
    • DOCTYPE declaration
    • HTML element
    • HEAD elemen
    • TITLE element
    • BODY element
  2. All XHTML tag names & attribute names must be in lowercase
  3. All XHTML elements must close. If it doesn’t have a closing element then it must be closed within the opening tag like this: <br />
  4. All XHTML elements must be properly nested
  5. All XHTML attribute values must be quoted

IMAGES and CSS contain the associated resources for the content.

The package.opf file is the core of the book. It provides the ebook reader with metadata for the publicaation as well as navigation and table of content structure.

The final file in this section, toc.ncx, acts as a backwards compatible bridge to epub 2, the previous version of the specification and still used by many ppublishers around the world.

The mimetype file

At the root of the book directory we must place a mimetype file. It has no extension and the only content in the file is the string application/epub+zip without a carriage return

Why Pandoc? How to create an epub

I’ve worked in creating epub and Kindle ebooks from scratch. Pandoc doesn’t provide the cleanest ebook in the market but it creates all the XML files and it’s just a matter of deciding how much cleanup you want to do.

The basic command is simple. Using a Markdown file as the source we use the following command.

pandoc sample.md -o sample.epub

We can add metadata using a syntax similar to YAML

---
title:
- type: main
text: My Book
- type: subtitle
text: An investigation of metadata
creator:
- role: author
text: John Smith
- role: editor
v 1.19.2.1 64
Pandoc User’s Guide Creating EPUBs with pandoc
text: Sarah Jones
identifier:
- scheme: DOI
text: doi:10.234234.234/33
publisher: My Press
rights: © 2007 John Smith, CC BY-NC
---

Pandoc supports the following data types:

  • identifier Either a string value or an object with fields text and scheme. Valid values for scheme are ISBN-10, GTIN-13, UPC, ISMN-10, DOI, LCCN, GTIN-14, ISBN-13, Legal deposit number, URN, OCLC, ISMN-13, ISBN-A, JP, OLCC
  • title Either a string value, or an object with fields file-as and type, or a list of such objects. Valid values for type are main, subtitle, short, collection, edition, extended
  • creator Either a string value, or an object with fields role, file-as, and text, or a list of such objects. Valid values for role are MARC relators, but pandoc will attempt to translate the human-readable versions (like “author” and “editor”) to the appropriate marc relators
  • contributor Same format as creator
  • date A string value in YYYY-MM-DD format. (Only the year is necessary.) Pandoc will attempt to convert other common date formats
  • lang (or legacy: language) A string value in BCP 47 format. Pandoc will default to the local language if nothing is specified
  • subject A string value or a list of such values
  • description A string value
  • type A string value
  • format A string value
  • relation A string value
  • coverage A string value
  • rights A string value
  • cover-image The path to the cover image
  • stylesheet The path to the CSS stylesheet
  • page-progression-direction Either ltr or rtl. Specifies the page-progression-direction attribute for the spine element

By default, pandoc will download linked media (including audio and video) and include it in the EPUB container, providing a complete epub package that will work regardless of network connectivity and other external facotrs.

If you want to link to external media resources instead, use raw HTML in your source and add data-external=”1″ to the tag with the src attribute.

For example:

<audio controls="1">
<source src="http://example.com/music/toccata.mp3"
  data-external="1" type="audio/mpeg">
</source>
</audio>

I recommend against linking to external resources unless you provide alternative feedback as this will make your book dependent on your nextwork connectivity and that far from reliable.

Markdown to Latex

LaTeX is a document preparation system. When writing, the writer uses plain text as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document (such as article, book, and letter), to stylise text throughout a document (such as bold and italics), and to add citations and cross-references. A TeX distribution such as TeX Live or MikTeX is used to produce an output file (such as PDF or DVI) suitable for printing or digital distribution. Within the typesetting system, its name is stylised as LATEX.

I’ve always been interested in ways to move away from word processors into more convenient and easier to use than the bloated binary files resulting from Word, Pages and other Word Processors. My current favorite is Markdown because it’s easier to read and I’ve worked on toolchains to convert the markdown to HTML and PDF.

LaTex is a good backup option that allows me to create PDF (and will be the intermediary step when we convert Markdown to PDF) and HTML from LaTex sources.

The command to convert Markdown to LaTex is simple:

pandoc -s sample.md -o sample.tex

The -s flag will make sure that we generate a complete document rather than a fragment. Otherwise the LaTex content will not work with other items in the toolchain.

An alternative: Markdown to PDF

The final task I want to discuss is converting Markdown to PDF with a toolchain other than what I currently use (Markdown to HTML and HTML through CSS Paged Media to PDF using PrinceXML). This process provides an alternative tool chain going from Mardown to LaTex and from LaTex to PDF.

The format of the PDF looks too much like a LaTex document and I’ve never been a fan. But the toolchain is open source (eventhough it’s my least favorite license, GPL) so I don’t have to worry about the vendor changing its mind about the licensing for the tool.

pandoc -s sample.md -o sample.pdf

Further thoughts

We’ve just scratched the surface of what Pandoc can do. One interesting idea is to convert Markdown to ICML (In Copy Markup Language) that we can then import to an InDesign template that we’ve set up in advance.

The possibilities look promising 🙂

PostCSS and crazy things you can do with it

PostCSS is an interesting project. In a nutshell, It takes CSS and turns it into an Abstract Syntax Tree, a form of data that JavaScript can manipulate. JavaScript-based plugins for PostCSS then performs different code manipulations. PostCSS itself doesn’t change your CSS, it allows plugins to perform transformations they’ve been designed to make.

There are essentially no limitations on the kind of manipulation PostCSS plugins can apply to CSS. If you can think of it, you can probably write a PostCSS plugin to make it happen.

It’s important to also know what PostCSS is not. This material is adapted from Envato-tuts+ PostCSS Deep Dive: What You Need to Know

PostCSS is Not a Pre-processor

Yes, you can absolutely use it as a preprocessor, but you can also use PostCSS without any preprocessor functionality. I only use Autoprefixer and, some times, CSS Nano. Neither of these tools is a pre-processor.

PostCSS is Not a Post-processor

Post processing is typically seen as taking a finished stylesheet comprising valid/standard CSS syntax and processing it, to do things like adding vendor prefixes. However PostCSS can do more than just post process the file; it’s just limited by the plugins you use and create.

PostCSS is Not “Future Syntax”

There are some excellent and very well known PostCSS plugins which allow you to write in future syntax, i.e. using CSS that will be available in the future but is not yet widely supported. However PostCSS is not inherently about supporting future syntax.

Using future syntax is your choice and not a requirement. Because I come from SCSS I do all weird development there and use PostCSS in a much more limited capability. If I so choose I can turn to PostCSS and use future looking features without being afraid that my target browsers will not support it.

PostCSS is Not a Clean Up / Optimization Tool

The success of the Autoprefixer plugin has lead to the common perception of PostCSS as something you run on your completed CSS to clean it up and optimize it for speed and cross browser compatibility.

Yes, there are many fantastic plugins that offer great clean up and optimization processes, but these are just a few of the available plugins.

Why I picked PostCSS and what we’ll do with it

I initially decided not to use PostCSS until I discovered that Autoprefixer and CSSNano, some of my favorite tools, are actually PostCSS plugins. That made me research PostCSS itself and see what it’s all about. What I found out is a basic tool and a rich plugin ecosystem that can do a lot of the things you may want to do with your CSS from adding vendor prefixes based on what you expect your users to have to analyzing your code for compliance with a given methodology like BEM.

I also like how PostCSS advocates for the single responsibility principle as outlined by Martin:

The single responsibility principle is a computer programming principle that states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class.

Wikipedia Entry: Single responsibility principle

Basically each PostCSS plugin should handle only one task and do it well. We should not create classes that do more than one thing and we shouldn’t duplicate functionality that is already available through another PostCSS plugin.

In this post we’ll explore how to build a PostCSS workflow using Gulp, how to build a plugin and how would you add plugins to the PostCSS workflow we created.

Running PostCSS

I work primarily in a Gulp environment so I built this task to work with PostCSS plugins and Autoprefixer in particular. Assuming you haven’t done before, install Gulp globally

npm i -g gulp

And then install the plugins we need: gulp, postcss and autoprefixer into the project you’re working it. the -D flag will save the plugins as development dependency.

npm i -D gulp gulp-postcss autoprefixer

The task itself is currently made of two parts:

  • The list of processors to use
  • The task itself

The task pipes the input through sourcemaps, then it runs postCSS and Autoprefixer that has already been configured with what versions of browsers to prefix for. It then writes the sourcemap and the output to the destination directory.

gulp.task("processCSS", () => {
  // What processors/plugins to use with PostCSS
  const PROCESSORS = [
    autoprefixer({browsers: ['last 3 versions']})
  ];
  return gulp
    .src("src/css/**/*.css")
    .pipe($.sourcemaps.init())
    .pipe(postcss(PROCESSORS))
    .pipe($.sourcemaps.write("."))
    .pipe(gulp.dest("src/css"))
    .pipe($.size({
      pretty: true,
      title:
        "processCSS"
    }));
});

If we run this task last in our CSS handling process we can change the destination from src/css to dest/css but the process where this task was first used there was an additional compression process beyond what SASS gave me; I wasn’t using CSSNano so I had to keep the files in the source directory to do further processing. We’ll revisit this when we discuss other plugins we can use.

Adding a second plugin

Even through the CSS for this task is compressed using SASS compressed format. I want more compression so we’ll use CSS Nano to do further compression.

To use it we first need to install the plugin

npm i -D cssnano

Next we need to modify our build script to require CSS Nano:

const cssnano = require('cssnano');

And, finally, we need to modify the task to incorporate CSS Nano. We do this by adding CSS Nano to our PROCESSORS array. The modified task now looks like this:

gulp.task("processCSS", () => {
  // What processors/plugins to use with PostCSS
  const PROCESSORS = [
    autoprefixer({browsers: ['last 3 versions']}),
    ccssnano()
  ];
  return gulp
    .src("src/css/**/*.css")
    .pipe($.sourcemaps.init())
    .pipe(postcss(PROCESSORS))
    .pipe($.sourcemaps.write("."))
    .pipe(gulp.dest("src/css"))
    .pipe($.size({
      pretty: true,
      title:
        "processCSS"
    }));
});

We can add further processors following the same formula: install and require the plugin, add the plugin (and any configuration) to the PROCESSORS array and test to make sure that it does what you want it to.

Building a plugin

The code for this section originally appeared in Tuts+ PostCSS Deep Dive: Create Your Own Plugin

What I find the most intriguing about PostCSS is the API and how easy it makes it for developers to create plugins to address specific needs.

What the CSS code will look like

Let’s assume, for example, that we have a set of fonts that Marketing has decided we should use in our content. Rather than type the full string of all the fonts in the stack you can do something like this instead:

html {
  font-family: stack("Arial");
  weight: normal;
  style: normal;
}

And the resulting CSS will appear like this:

html {
  font-family: 'Arial, "Helvetica Neue", Helvetica, sans-serif';
  weight: normal;
  style: normal;
}

Configuring the project

To initialize the plugin project we have to create a folder and initialize the package with NPM and accept the defaults automatically. We do this with the following commands:

mkdir local-stacks # Creates the directory
cd local-stacks # Changes to the directory we just created
npm init --yes # Inits NPM accepting all defaults automatically

Now we must create the file we’ll use as our plugin’s entry point, index.js. We can create this with:

touch index.js

Or create the file in your text editor. I normally use Visual Studio Code.

Writing the code

To get the plugin going we need to install and require two plugins: The PostCSS core engine (postcss) and Underscore (underscore) that we will use to merge local and plugin configurations. I am not using ES6 module import syntax (although it would make the code simpler) because I want to use the module with older versions of Node.

We then define an array of the font stacks that we want to use. The name we want to use for the stack is the key and the stack itself is the value for the key.

const postcss = require('postcss');
const _ = require('underscore');

// Font stacks from http://www.cssfontstack.com/
const fontstacks_config = {
  'Arial': 'Arial, 'Helvetica Neue', Helvetica, sans-serif',
  'Times New Roman': 'TimesNewRoman, 'Times New Roman', Times, Baskerville, Georgia, serif',
  'Lucida Grande':, 'Lucida Sans Unicode', 'Lucida Sans', Geneva, Verdana, sans-serif;
}

toTitleCase will convert the string passed to it so that the first letter of each word is capitalized. The regular expression that we use to capture the string to title case is a little complicated (it was for me when I first saw it) so I’ve unpacked it below:

  • \w matches any word character (equal to [a-zA-Z0-9_])
  • \S* matches any non-whitespace character (equal to [^\r\n\t\f ])
  • * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  • g modifier – Return all matches (don’t return after first match)
// Credit for this function to http://bit.ly/1hJj9jb in SO
function toTitleCase(str) {
  return str.replace(/\w\S*/g, function(txt){return txt.charAt(0).toUpperCase() +
    txt.substr(1).toLowerCase();});
}

The module we’re exporting is the actual plugin. We give it a name, local-stacks, and we define it as function. In the function:

  • We walk through all the rules in the stylesheet using walkRules, part of the PostCSS API
  • For each rule we walk through all the declaractions using walkDecls, also part of the PostCSS API
  • We test if there is a fontstack call in the declaration. If there is one we:
    1. Get the name of the fontstack requested by matching the value inside the parenthesis and then replacing any quotation marks
    2. Title case the resulting string in case the user didn’t
    3. Look the name of the font stack in the fontstack_config object
    4. Capture any value that was in the string before the fonstack call
    5. Create a new string with both the first font and the value of our font stack
    6. Return the new value as the value of our declaration
module.exports = postcss.plugin('local-stacks', function (options) {

    return function (css) {

      options = options || {};

    fontstacks_config = _.extend(fontstacks_config, options.fontstacks);

    css.walkRules(function (rule) {

      rule.walkDecls(function (decl, i) {
        var value = decl.value
        if (value.indexOf( 'fontstack(' ) !== -1) {

          var fontstack_requested = value.match(/\(([^)]+)\)/)[1]
            .replace(/["']/g, "");

          fontstack_requested = toTitleCase(fontstack_requested);

          var fontstack = fontstacks_config[fontstack_requested];

          var first_font = value.substr(0, value.indexOf('fontstack('));

          var new_value = first_font + fontstack;

          decl.value = new_value;

          }
        });
      });
    });
  }
});

Next Steps and Closing

This is a very simple plugin as far as plugins go. You can look a Autoprefixer and CSS Nano for more complex examples and ideas of what you can do with PostCSS. If you’re interested in exploring the API, it is fully documented at http://api.postcss.org/.

An important reminder, you don’t have to reinvent the wheel. There is a large plugin ecosystem available at www.postcss.parts. We can use these plugins to get the results we want.
 
This makes writing your own plugin a fun but not always necessary exercise.

Once you’ve scoped your CSS project you can decide how much of PostCSS you need and how your needs can be translated into existing plugins and custom code.

Links, Resources and Ideas

Page Visibility

There is nothing more anoying than having audio/video playing in a tab when it’s in the background or when the browser is minimized.

The Page Visibility API gives us control of what to do with media or any other element of a page when the tab is hidden or not visible. Some of the things we can do with this API:

  • Pausing a video when the page has lost user focus.
  • Stop an HTML5 canvas animation from running when a user is not on the page.
  • Show a notification to the user only when the page is active.
  • Stop movement of a slider carousel when the page has lost focus.

The API introduces two new attributes to the document element: document.visibilityState and document.hidden.

document.visibilityState holds four different values which are as below:

  • hidden: Page is not visible on any screen
  • prerender: Page is loaded off-screen and not visible to user
  • visible: Page is visible
  • unloaded: Page is about to unload (user is navigating away from current page)

document.hidden is boolean property, that is set to false if page is visible and true if page is hidden.

The first example pauses a video if the container tab is hidden or not visible and plays it otherwise.

We start by adding a visibilitychange event listener to the document. Inside the listener we check if the document is hidden and pause the video if it is; otherwise we play it.

video = document.getElementById('myVideo');

document.addEventListener('visibilitychange', function () {
    if (document.hidden) {
        video.pause();
    } else {
        video.play();
    }
});

The most obvious use is to control video playback when the tab is not visible. When I wrote about creating custom controls for HTML5 video I used a click event handler like this one to control video play/pause status:

play.addEventListener('click', e => {
  // Prevent Default Click Action
  e.preventDefault();
  if (video.paused) {
    video.play();
    playIcon.src = 'images/icons/pause.svg';
  } else {
    video.pause();
    playIcon.src = 'images/icons/play-button.svg';
  }
});

We can further enhance it so that it will only play the video if the document is visible.

We wrap the if block that controls playback in another if block that tests the page’s visibility state and only moves forward if we can see the page. If the page is not visible then we pause the video, regardless of whether it’s currently playing.

The code now looks like this:

play.addEventListener('click', e => {
  // Prevent Default Click Action
  e.preventDefault();
  if (document.visibilityState === 'visible') {
    if (video.paused) {
      video.play();
      playIcon.src = 'images/icons/pause.svg';
    } else {
      video.pause();
      playIcon.src = 'images/icons/play-button.svg';
    }
  } else {
    video.pause();
  }
});

With those simple changes we’ve ensured that the video will not play in the background and that there will be no other distractions when we work on other tabs. We then should do something similar for our keyboard video controls.

HTTP2 Server Push, Link Preload And Resource Hints

HTTP/2 Server Push, Link Preload And Resource Hints

We’ve become performance obsessed, it’s important and the obsession shows. Firtunately we’re also given the tools to acommodate the need for speed. We’ll look at three ways of helping our servers give us the resources we need before we actually need them.

We’re not covering service workers in this post. Even thoough they are very much a performance feature they are client-side and we want to discuss server side performance improvements or improvements t hat are used directly from the HTML code, not Javascript.

What is Server Push

Accessing websites has always followed a request and response pattern. The user sends a request to a remote server, and with some delay, the server responds with the requested content.

The initial request to a web server is usually for an HTML document. The server returns the requested HTML resource. The browser parses the HTML and discoversreferences to other assets (style sheets, scripts, fonts and images). The browser requests these new assets which runs the process again (a stylesheet may have links to fonts or to images being used as background).

The problem with this mechanism is that users must wait for the browser to discover and retrieve critical assets until after an HTML document has been downloaded. This delays rendering and increases load times.

What problem does server push solve?

With server push we now have a way to preemptively “push” assets to the browser before the browser explicitly request them. If we do this carefully then we can increase perceived performance by sending things we know users are going to need.

Let’s say that our site uses the same fonts throughout and it uses one common stylesheet named main.css. When the user requests our site’s main page, index.html we can push these files we know we’ll need right after we send the reponse for index.html.

This push will increase perceived speed because it enables the browser to rende the page faster than waiting for the server to respond with the HTML file and then parsing it to discover additional resources that it must request and parse.

Server push also acts as a suitable alternative for a number of HTTP/1-specific optimizations, such as inlining CSS and JavaScript directly into HTML, as well as using the data URI scheme to embed binary data into CSS and HTML.

If you inline CSS into an HTML document within <style> tags, the browser can begin applying styles immediately to the HTML without waiting to fetch them from an external source. This concept holds true with inlining scripts and inlining binary data with the data URI scheme.

However the big pitfall of these techniques is that they can’t be cached outside of the page they are embedded in and, if the same code is used in more than one page, we end up with duplicated code in our pages.

Say, for example, that we want to push a CSS style sheet and a Javascript file for all requests that are HTML pages. In an Apache HTTP Server (version 2.4.17 and later) you can configure the server to push with something like this:

<if "%{DOCUMENT_URI} =~ /\.html$/">
  H2PushResource add css/site.css
  H2PushResource add js/site.js
</if>

We can then configure push resources that are specific to each section of our site, for example:

<if "%{DOCUMENT_URI} == '/portfolio/index.html'">
  H2PushResource add /css/dist/critical-portfolio.css?01042017
</if>

<if "%{DOCUMENT_URI} == '/code/index.html'">
  H2PushResource add /css/dist/critical-code.css?01042017
</if>

Yes, this means we have to play with server configurations, either the default configuration, a virtual host or on a directory basis using .htaccess. I still consider this an effort worth making.

It’s not all positive. Some things to be aware of:

YOU CAN PUSH TOO MUCH STUFF

Just like when building an application shell with a service worker, we need to be extemely careful about the size of the assets we choose to push. Too many files or files that are too large will defeat the purpose of pushing assets to improve performance as they’ll delay rendering.

YOU CAN PUSH SOMETHING THAT’S NOT ON THE PAGE

This is not necessarily a bad thing if you have visitor analytics to back up this strategy or you know that the asset will be used elsewhere on your site. When in doubt don’t push. Remember that you may be costing people real money whe pushing unnecessary resources to people who are on restricted mobile data plans.

CONFIGURE YOUR HTTP/2 SERVER PROPERLY

Some servers give you a lot of server push-related configuration options. Apache’s mod_http2 has some options for configuring how assets are pushed. Check your server’s configuration for details about what options can be configured and how to do it.

PUSHES MAY NOT BE CACHED

There has been some questions on whether server push could cause assets to be unnecessarily pushed to users when they return to our site. One way to control this is to only push assets when a cookie indicating the assets were pushed is not present.

An example of this technique is in Jeremy Wagner’s article Creating a Cache-aware HTTP/2 Server Push Mechanism at CSS Tricks. It provides an example of a way to create cookies to check against when pushing files from the server.

Link Preload

The Preload Specification aims at improving performance and providing more granular loading control to web developers. We can customize loading behavior in a way that doesn’t incur the penalites of loader scripts.

With preload the browser can do things that are just no possible with H2 Push:

  • The browser can set a resource’s priority, so that it will not delay more important resources, or lag behind less important resources
  • The browser can enforce the right Content-Security-Policy directives, and not go out to the server if it shouldn’t
  • The browser can send the appropriate Accept headers based on the resource type
  • The browser knows the resource type so it can determine if the resource could be reused

Preload has a functional onload event that we can leverage for additional functionality. It will not block the window.onload event unless the resource is blocked by a resource that blocks the event elsewhere in your content.

Loading late-loading resources

The basic way you could use preload is to load late-discovered resources early. Not all resources that make a web page are visible in the initial markup. For example, an image or font can be hidden inside a style sheet or a script. The browser can’t know about these resources until it parses the containing style sheet or script and that may end up delaying rendering or loading entire sections of your page.

Preload is basically telling the browser “hey, you’re going to need this later so please start loading it now”.

Preload works as a new rel attribute of the link element. It has three attributes:

  • rel indicates the type of link, for preload links we use the preload value
  • href is the relative URL for the asset we’re preloading.
  • as indicates the kind of resource we’re preloading. It can be one of the following:
    • “script”
    • “style”
    • “image”
    • “media”
    • “document”

Knowing what the attributes we can look at how to use it responsibly.

<link rel="preload" href="late_discovered_thing.js" as="script">

Early loading fonts and the crossorigin attribute

Loadinng fonts is just the same as preloading other types of resources with some additional constraints

<link rel="preload"
      href="font.woff2"
      as="font"
      type="font/woff2"
      crossorigin>

You must add a crossorigin attribute when fetching fonts, since they are fetched using anonymous mode CORS. Yes, even if your fonts are on the same origin as the page.

The type attribute is there to make sure that this resource will only get preloaded on browsers that support that file type. Only Chrome supports preload and it also supports WOFF2, but not all browsers that will support preload in the future may support the specific font type. The same is true for any resource type you’re preloading and which browser support isn’t ubiquitous.

Markup-based async loader

Another thing you can do is to use the onload handler in order to create some sort of a markup-based async loader. Scott Jehl was the first to experiment with that, as part of his loadCSS library. In short, you can do something like:

<link rel="preload"
      as="style"
      href="async_style.css"
      onload="this.rel='stylesheet'">

The same can also work for async scripts.

We already have <script async> you say? Well, </script><script async> is great, but it blocks the window’s onload event. In some cases, that’s exactly what you want it to do, but in other cases it might not be.

Responsive Loading Links

Preload links have a media attribute that we can use to conditionally load resources based on a media query condition.

What’s the use case? Let’s say your site’s large viewport uses an interactive map, but you only show a static map for the smaller viewports.

You want to load only one of those resources. The only way to do that would be to load them dynamically using Javascript. If you use a script to do this you hide those resources from the preloader, and they may be loaded later than necessary, which can impact your users’ visual experience, and negatively impact your SpeedIndex score.

Fortunately you can use preload to load them ahead of time, and use its media attribute so that only the required script will be preloaded:

<link rel="preload"
      as="image"
      href="map.png"
      media="(max-width: 600px)">

<link rel="preload"
      as="script"
      href="map.js"
      media="(min-width: 601px)">

Resource Hints

In addition to preload and server push we can also ask the browser to help by providing hints and instructions on how to interact with resources.

For this section we’ll discuss

  • DNS Prefetching
  • Preconnect
  • Prefetch
  • Prerender

DNS prefetch

This hint tells the browser that we’ll need assets from a domain so it should resolve the DNS for that domain as quickly as possible. If we know we’ll need assets from example.com we can write the following in the head of the document:

<link rel="dns-prefetch" href="//example.com">

Then, when we request a file from it, we’ll no longer have to wait for the DNS lookup. This is particularly useful if we’re using code from third parties or resources from social networks where we might be loading a widget from a </script><script>.

Preconnect

Preconnect is a more complete version of DNS prefetch. IN addition to resolving the DNS it will also do the TCP handshake and, if necessary, the TLS negotiation. It looks like this:

<link rel="preconnect" href="//example.net">

For more information, Ilya Grigorik wrote a great post about this handy resource hint:

Prefetching

This is an older version of preload and it works the same way. If you know you’ll be using a given resource you can request it ahead of time using the prefetch hint. For example an image or a script, or anything that’s cacheable by the browser:

<link rel="prefetch" href="image.png">

Unlike DNS prefetching, we’re actually requesting and downloading that asset and storing it in the cache. However, this is dependent on a number of conditions, as prefetching can be ignored by the browser. For example, a client might abandon the request of a large font file on a slow network. Firefox will only prefetch resources when “the browser is idle”.

Since we know have the preload API I would recommend using that API (discussed earlier) instead.

Prerender

Prerender is the nuclear option, since it will load all of the assets for a given document like so:

<link rel="prerender" href="http://css-tricks.com">

Steve Souders wrote a great explanation about this technique:

This is like opening the URL in a hidden tab – all the resources are downloaded, the DOM is created, the page is laid out, the CSS is applied, the JavaScript is executed, etc. If the user navigates to the specified href, then the hidden page is swapped into view making it appear to load instantly. Google Search has had this feature for years under the name Instant Pages. Microsoft recently announced they’re going to similarly use prerender in Bing on IE11.

But beware! You should probably be certain that the user will click that link, otherwise the client will download all of the assets necessary to render the page for no reason at all. It is hard to guess what will be loaded but we can make some fairly educated guesses as to what comes next:

  • If the user has done a search with an obvious result, that result page is likely to be loaded next.
  • If the user navigated to a login page, the logged-in page is probably coming next.
  • If the user is reading a multi-page article or paginated set of results, the page after the current page is likely to be next.

Combining h2 push and client side technologies

Please make sure you test the code in the sections below in your own setup. This may improve your site’s performance or it may degrade beyond acceptable levels.

You’ve been warned

We can combine server and client side teechnologies to further increase performance. Some of the things we can do include:

Gzip the content you serve

One way we can further reduce the size of our payloads is compressing them while in transit. This way we make our files smaller in transite and they are expanded by the browser when they receive them.

How we compress data depends on the server we’re using. The first example works with Apache mod_deflate and the configuration goes in the global Apache server configuration, inside a virtual host directive or an .htaccess file.

We’re not compressing images as I’m not 100% certain that they’ll survive the trip as other resources will and I already compress them before uploading them to the server.

We also skip files that already have a Content-Encoding header. We don’t need to compress them if they are already compressed 🙂

<ifModule mod_gzip.c>
  mod_gzip_on Yes
  mod_gzip_dechunk Yes
  mod_gzip_item_include file .(html?|txt|css|js|php|pl)$
  mod_gzip_item_include handler ^cgi-script$
  mod_gzip_item_include mime ^text/.*
  mod_gzip_item_include mime ^application/x-javascript.*
  mod_gzip_item_exclude mime ^image/.*
  mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>

The following code uses NGNINX and NGINX’s HTTP GZip Module. Nginx compression will not work with versions of IE before 6. But, honestly, if you’re still serving browsers that old you have more serious issues 🙂

We also add a vary header to stop proxy servers from sending gzipped files to IE6 and older.

gzip on;
gzip_comp_level 2;
gzip_http_version 1.1;
gzip_proxied any;
gzip_min_length 1100;
gzip_buffers 16 8k;
gzip_types text/plain text/html text/css
  application/x-javascript text/xml application/xml
  application/xml+rss text/javascript;

# Disable for IE < 6 because there are some known problems
gzip_disable "MSIE [1-6].(?!.*SV1)";

# Add a vary header for downstream proxies
# to avoid sending cached gzipped files to IE6
gzip_vary on;

Preload resources and cache them with a service worker cache

The code below is written in PHP. I’m working on coverting it to Javascript/Node. If you have such an example, please share it 🙂

There has been some questions on whether server push could cause assets to be unnecessarily pushed to users when they return to our site. One way to control this is to only push assets when a cookie indicating the assets were pushed is not present; we then store those assets in the service worker.

An example of this technique is in Jeremy Wagner’s article Creating a Cache-aware HTTP/2 Server Push Mechanism at CSS Tricks. It provides an example of a way to create cookies to check against when pushing files from the server.

function pushAssets() {
  $pushes = array(
    "/css/styles.css" => substr(md5_file("/var/www/css/styles.css"), 0, 8),
    "/js/scripts.js" => substr(md5_file("/var/www/js/scripts.js"), 0, 8)
  );

  if (!isset($_COOKIE["h2pushes"])) {
    $pushString = buildPushString($pushes);
    header($pushString);
    setcookie("h2pushes", json_encode($pushes), 0, 2592000, "", ".myradwebsite.com", true);
  } else {
    $serializedPushes = json_encode($pushes);

    if ($serializedPushes !== $_COOKIE["h2pushes"]) {
        $oldPushes = json_decode($_COOKIE["h2pushes"], true);
        $diff = array_diff_assoc($pushes, $oldPushes);
        $pushString = buildPushString($diff);
        header($pushString);
        setcookie("h2pushes", json_encode($pushes), 0, 2592000, "", ".myradwebsite.com", true);
    }
  }
}

function buildPushString($pushes) {
  $pushString = "Link: ";

  foreach($pushes as $asset => $version) {
    $pushString .= "<" . $asset . ">; rel=preload";

    if ($asset !== end($pushes)) {
      $pushString .= ",";
    }
  }
  return $pushString;
}
// Push those assets!
pushAssets();

This function (taken from Jeremy’s article) checks to see if there is a h2pushes cookie stored in the user’s browser. If there isn’t one then it uses the buildPushString helper function to generates preload links for the resources specified in the $pushes array and send them over as headers for the page and adds the h2pushes cookie to the browser with a representation of the paths that were preloaded.

If the user has been here before we need to decide if there’s anything to preload. Because we want to re-push assets if they change, we need to fingerprint them for comparison later on. For example, if you’re serving a styles.css file, but you change it, you’ll use a cache busting strategy like adding a random string to the file name at build time at bor appending a value to the query string to ensure that the browser won’t serve a stale version of the resource.

The function will decode the values stored on the cookie and compare the values with what you want to preload. If they are the same then it does nothing and moves forward, if the values are different then the function will take the new values, create preload links and update the cookie with the new values.

If you preload too many files this function may have detrimental effects on performance. As we discussed earlier you need to be mindful of what you preload and how large are the files you preload. But with this method at least we can be confident we’re not pushing duplicate assets to the page

Links and resources