XML Workflows: CSS Styles for Paged Media

This is the generated CSS from the SCSS style sheets (see the scss/ directory for the source material.) I’ve chosen to document the resulting stylesheet here and document the SCSS source in another document to make life simpler for people who don’t want to deal with SASS or who want to see what the style sheets look like.

Typography derived from work done at this URL: http://bit.ly/16N6Y2Q

The following scale (also using minor third progression) may also help: http://bit.ly/1DdVbqK

Feel free to play with these and use them as starting point for your own work 🙂

The project currently uses these fonts:

  • Roboto Slab for headings
  • Roboto for body copy
  • Source Code Pro for code blocks and preformated text

Font Imports

Even though SCSS Lint throws a fit when I put font imports in a stylesheet because they stop asynchronous operations, I’m doing it to keep the HTML files clean and because we are not loading the CSS on the page, we’re just using it to process the PDF file.

Eventually I’ll switch to locally hosted fonts using bulletproof font syntax (discussed here and available for use at Font Squirrel.

At this point we are not dealing with font subsetting but we may in case we need to.

@import url(http://fonts.googleapis.com/css?family=Roboto:100italic,100,400italic,700italic,300,700,300italic,400);
@import url(http://fonts.googleapis.com/css?family=Roboto+Slab:400,700);
@import url(http://fonts.googleapis.com/css?family=Source+Code+Pro:300,400);

Defaults

Now that we’ve loaded the fonts we can create our defaults for the document. The html element defines vertical overflow and text size adjustment for Safari and Windows browsers.

html {
overflow-y: scroll;
-ms-text-size-adjust: 100%;
-webkit-text-size-adjust: 100%;
}

The body selector will handle most of the base formatting for the the document.

The selector sets up the following aspects of the page:

  • background and font color
  • font family, size and weight
  • line height
  • left and right padding (overrides the base document’s padding)
  • orphans and widows
body {
background-color: #fff;
color: #554c4d;
font-family: 'Roboto', 'Helvetica Neue', Helvetica, sans-serif;
font-size: 1em;
font-weight: 100;
line-height: 1.1;
orphans: 4;
padding-left: 0;
padding-right: 0;
widows: 2;
}

Blockquotes, Pullquotes and Marginalia

It’s fairly easy to create sidebars in HTML so I’ve played a lot with pull quotes, blockquotes and asides as a way to move the content around with basic CSS. We can do further work by tuning the CSS

aside {
border-bottom: 3px double #ddd;
border-top: 3px double #ddd;
color: #666;
line-height: 1.4em;
padding-bottom: .5em;
padding-top: .5em;
width: 100%;
}

aside .pull {
margin-bottom: .5em;
margin-left: -20%;
margin-top: .2em;
}

The magin-notes* and content* move the content to the corresponding side of the page without having to create specific CSS to do so. The downside is that, as with many things in CSS, you are stuck with the provided values and will have to modify them to suit your needs.

.margin-notes,
.content-left {
font-size: .75em;
margin-left: -230px;
margin-right: 20px;
text-align: right;
width: 230px;
}

.margin-notes-right,
.content-right {
font-size: .75em;
margin-left: 760px;
margin-right: -20px;
position: absolute;
text-align: left;
width: 230px;
}

.content-right {
font-size: .75em;
margin-left: 760px;
margin-right: -20px;
position: absolute;
text-align: left;
width: 230px;
}

.content-right ul,
.content-left ul {
list-style: none;
}

The opening class style creates a large distinguishing block container for opening text. This is useful when you have a summary paragraph at the beginning of your document or some other opening piece of text to go at the top of your document

.opening {
border-bottom: 3px double #ddd;
border-top: 3px double #ddd;
font-size: 2em;
margin-bottom: 10em;
padding-bottom: 2em;
padding-top: 2em;
text-align: center;
}

Blockquotes present the enclosed text in larger italic font with a solid bar to the left of the content. Because the font is larger I’ve added

blockquote {
border-left: 5px solid #ccc;
color: #222023;
font-size: 1.5em;
font-style: italic;
font-weight: 100;
margin-bottom: 2em;
margin-left: 4em;
margin-right: 4em;
margin-top: 2em;
}
blockquote p {
padding-left: .5em;
}

The pullquote classes were modeled after an ESPN article and look something like this:

example pullquote

The original was hardcoded to pixels. Where possible I’ve changed the values to em to provide a more responsive

.pullquote {
border-bottom: 18px solid #000;
border-top: 18px solid #000;
font-size: 2.25em;
font-weight: 700;
letter-spacing: -.02em;
line-height: 2.125em;
margin-right: 2.5em;
padding: 1.25em 0;
position: relative;
width: 200px;
}
.pullquote p {
color: #00298a;
font-weight: 700;
text-transform: uppercase;
z-index: 1;
}
.pullquote p:last-child {
line-height: 1.25em;
padding-top: 2px;
}
.pullquote cite {
color: #333;
font-size: 1.125em;
font-weight: 400;
}

Paragraphs

The paragraph selector creates the default paragraph formatting with a size of 1em (equivalent to 16 pixels) and a line height of 1.3 em (20.8 pixels)

p {
font-size: 1em;
margin-bottom: 1.3em;
}

To indent all paragraphs but the first we use the sibling selector we indent all paragraphs that are the next sibling of another paragraph element (that is: the next child of the same parent).

The first paragraph doesn’t have a paragraph sibling so the indent doesn’t happen but all other paragraphs are indented

p + p {
text-indent: 2em;
}

Rather than use pseudo elements (:first-line and :first-letter) we use classes to give authors the option to use these elements.

.first-line {
font-size: 1.1em;
text-indent: 0;
text-transform: uppercase;
}

.first-letter {
float: left;
font-size: 7em;
line-height: .8em;
margin-bottom: -.1em;
padding-right: .1em;
}

Lists

The only thing we do for list and list items is to indicate what type of list we’ll use as our default square for unordered list and Arabic decimals for our numbered lists.

ul li {
list-style: square;
}

ol li {
list-style: decimal;
}

Figures and captions

The only interesting aspect of the CSS we use for figures is the counter. The figure figcaption::before selector creates automatic text that is inserted before each caption. This text is the string “Figure”, the value of our figure counter and the string “: “.

This makes it easier to insert figures without having to change the captions for all figures after the one we inserted. The figure counter is reset for every chapter. I’m researching ways to get the figure numbering across chapters.

figure {
counter-increment: figure_count;
margin-bottom: 1em;
margin-top: 1em;
}
figure figcaption {
font-weight: 700;
padding-bottom: 1em;
padding-top: .2em;
}

figure figcaption::before {
content: "Figure " counter(figure_count) ": ";
}

Headings

Headings are configured in two parts. The first one sets common attributes to all headings: font-family, font-weight, hyphes, line-height, margins and text-transform.

It’s this attribute that needs a little more discussion. Using text-transform we make all headings uppercase without having to write them that way

h1,
h2,
h3,
h4,
h5,
h6 {
font-family: 'Roboto Slab', sans-serif;
font-weight: 400;
hyphens: none;
line-height: 1.2;
margin: 1.414em 0 .5em;
text-transform: uppercase;
}

In the second part of our heading styles we work on rules that only apply to one heading at a time. Things such as size and specific attributes (like removing the top margin on the h1 elements) need to be handled need to be handled individually

h1 {
font-size: 3.157em;
margin-top: 0;
}

h2 {
font-size: 2.369em;
}

h3 {
font-size: 1.777em;
}

h4 {
font-size: 1.333em;
}

h4,
h5,
h6 {
text-align: inherit;
}

Different parts of the book

There are certains aspects of the book that need different formatting from our defaults.

We use the element[attribute=name] syntax to identify which section we want to work with and then tell it the element within the section that we want to change.

For example, in the bibliography (a section with the data-type='bibliography attribute) we want all paragraphs to be left aligned and all paragraphs to have no margin (basicallwe we are undoing the indentation for paragraphs with sibling paragraphs within the bibliography section)

section[data-type='bibliography'] p {
text-align: left;
}
section[data-type='bibliography'] p + p {
text-indent: 0 !important;
}

The same logic applies to the other sections that we’re customizing. We tell it what type of section we are working with and what element inside that sectin we want to change.

section[data-type='titlepage'] h1,
section[data-type='titlepage'] h2,
section[data-type='titlepage'] p {
text-align: center;
}

section[data-type='dedication'] h1,
section[data-type='dedication'] h2 {
text-align: center;
}
section[data-type='dedication'] p {
text-align: left;
}
section[data-type='dedication'] p + p {
text-indent: 0 !important;
}

Preformatted code blocks

A lot of what I write is technical and requires code examples. We take a two pronged approach to the fenced code blocks.

We format some aspects our content (wrap, font-family, size, line height and wether to do page breaks inside the content) locally and hand off syntax highlighting to highlight.js with a style to mark the content differently.

pre {
overflow-wrap: break-word;
white-space: pre-line !important;
word-wrap: break-word;
}
pre code {
font-family: 'Source Code Pro', monospace;
font-size: 1em;
line-height: 1.2em;
page-break-inside: avoid;
}

Miscelaneous classes

Rather than for people to justify text we provide a class to make it so. I normally justify at the div or section level but it’s not always necessary or desirable.

Code will be used in a future iteration of the code to highlight inline snippets (think of it as an inline version of the <pre><code> tag combination)

.justified {
text-align: justify;
}

.code {
background-color: #e6e6e7;
opacity: .75;
}

Columns

The last portion of the stylesheet deals with columns. I’ve set up 2 set of rules for 2 and 3 column with similar attributes. In the SCSS source these are created with a column mixin.

.columns2 {
column-count: 2;
column-gap: 3em;
column-fill: balance;
column-span: none;
line-height: 1.25em;
width: 100%;
}
.columns2 p:first-of-type {
margin-top: 0;
}
.columns2 p + p {
text-indent: 2em;
}
.columns2 p:last-of-type {
margin-bottom: 1.25em;
}

.columns3 {
column-count: 3;
column-gap: 10px;
column-fill: balance;
column-span: none;
width: 100%;
}
.columns3 p:first-of-type {
margin-top: 0;
}
.columns3 p:not:first-of-type {
text-indent: 2em;
}
.columns3 p:last-of-type {
margin-bottom: 1.25em;
}

XML Wokflows: From XML to PDF: Part 2: CSS

With the HTML ready, we can no look at the CSS stylesheet to process it into PDF.

The extensions, pseudo elements and attributes we use are all part of the CSS Paged Media or Generated Content for Paged Media specifications. Where appropriate I’ve translated them to work on both PDF and HTML.

Book defaults

The first step in creating the default structure for the book using @page at-element.

Our base definition does the following:

  1. Size the page to letter (8.5 by 11 inches), width first
  2. Use CSS notation for margins. In this case the top and bottom margin are 0.5 inches and left and right are 1 inch
  3. Reset the footnote counter.
  4. Using the @footnote attribute do the following
    1. Increment the footnote counter
    2. Place footnote at the bottom using another value for the float attribute
    3. Span all columns
    4. Make the height as tall as necessary
/* STEP 1: DEFINE THE DEFAULT PAGE */
@page {
  size: 8.5in 11in; (1)
  margin: 0.5in 1in; (2)
  /* Footnote related attributes */
  counter-reset: footnote; (3)
  @footnote {
    counter-increment: footnote; (4.1)
    float: bottom; (4.2)
    column-span: all; (4.3)
    height: auto; (4.4)
    }
  }

In later sections we’ll create named page templates and associate them to different portions of our written content.

Page counters

We define two conditions under which we reset the page counter: When we have a book followed by a part and when we have a book followed by the a first chapter.

We do not reset the content when the path if from book to chapter to part.

body[data-type='book'] > div[data-type='part']:first-of-type,
body[data-type='book'] > section[data-type='chapter']:first-of-type { counter-reset: page; }
body[data-type='book'] > section[data-type='chapter']+div[data-type='part'] { counter-reset: none }

Matching content sections to page types

The next section of the style sheet is to match the content on our book to pages in our style sheet.

The book is broken into sections with data-type attributes to indicate the type of content; we match the section[data-type] element to a page type along with some basic style definitions.

We will further define the types of pages later in the style sheet.

/* Title Page*/
section[data-type='titlepage'] { page: titlepage }

/* Copyright page */
section[data-type='copyright'] { page: copyright }

/* Dedication */
section[data-type='dedication'] {
  page: dedication;
  page-break-before: always;
}

/* TOC */
section[data-type='toc'] {
  page: toc;
  page-break-before: always;
}
/* Leader for toc page */
section[data-type='toc'] nav ol li a:after {
  content: leader(dotted) ' ' target-counter(attr(href, url), page);
}

/* Foreword  */
section[data-type='foreword'] { page: foreword }

/* Preface*/
section[data-type='preface'] { page: preface }

/* Part */
div[data-type='part'] { page: part }

/* Chapter */
section[data-type='chapter'] {
  page: chapter;
  page-break-before: always;
}

/* Appendix */
section[data-type='appendix'] {
  page: appendix;
  page-break-before: always;
}

/* Glossary*/
section[data-type='glossary'] { page: glossary }

/* Bibliography */
section[data-type='bibliography'] { page: bibliography }

/* Index */
section[data-type='index'] { page: index }

/* Colophon */
section[data-type='colophon'] { page: colophon }

Front matter formatting

For each page of front matter contnt (toc, foreword and preface) we define two pages: left and right. We do it this way to acommodate facing pages with numbers on ooposite sides (for two sided printout)

For the front matter we chose to use Roman numerals on the bottom of the page

/* Comon Front Mater Page Numbering in lowercase ROMAN numerals*/
@page toc:right {
  @bottom-right-corner { content: counter(page, lower-roman) }
  @bottom-left-corner { content: normal }
}

@page toc:left  {
  @bottom-left-corner { content: counter(page, lower-roman) }
  @bottom-right-corner { content: normal }
}


@page foreword:right {
  @bottom-center { content: counter(page, lower-roman) }
  @bottom-left-corner { content: normal }
}

@page foreword:left  {
  @bottom-left-corner { content: counter(page, lower-roman) }
  @bottom-right-corner { content: normal }
}


@page preface:right {
  @bottom-center {content: counter(page, lower-roman)}
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

@page preface:left  {
  @bottom-center {content: counter(page, lower-roman)}
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

Pages formatting

We use the same system we used in the front matter to do a few things with our content.

We first remove page numbering from the title page and dedication by setting the numbering on both bottom corners to normal.

/* Common Content Page Numbering  in Arabic numerals 1... 199 */
@page titlepage{ /* Need this to clean up page numbers in titlepage in Prince*/
  margin-top: 18em;
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

@page dedication { /* Need this to clean up page numbers in titlepage in Prince*/
  page-break-before: always;
  margin-top: 18em;
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }

}

Now we start working on our chapter pages. The first thing we do is to place our running header content in the bottom middle of the page, regardless of whether it’s left or right.

@page chapter {
  @bottom-center {
    vertical-align: middle;
    text-align: center;
    content: element(heading);
  }
}

We next setup a blank page for our chapters and tell the reader that the page was intentionally left blank to prevent confusion

@page chapter:blank { /* Need this to clean up page numbers in titlepage in Prince*/
  @top-center { content: "This page is intentionally left blank" }
  @bottom-left-corner { content: normal;}
  @bottom-right-corner {content:normal;}
}

Then we number the pages the same way that we did for our front matter except that we use narabic numerals instead of Roman. 

```css
@page chapter:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page chapter:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page appendix:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page appendix:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page glossary:right,  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page glossary:left, {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page bibliography:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page bibliography:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page index:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page index:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

Running footer

We now style the running footer.

p.rh {
  position: running(heading);
  text-align: center;
  font-style: italic;
}

Footnotes and cross references

Footnotes are tricky, they consist of two parts, the footnote-call and the footnote content itself. I’m still trying to figure out what the correct markup should be for marking up footnotes.

We’ve also defined a special class of links that appends a string and the the destination’s page number.

/* Footnotes */
span.footnote {
  float: footnote;
}

::footnote-marker {
  content: counter(footnote);
  list-style-position: inside;
}

::footnote-marker::after {
  content: '. ';
}

::footnote-call {
  content: counter(footnote);
  vertical-align: super;
  font-size: 65%;
}

/* XReferences */
a.xref[href]::after {
    content: ' [See page ' target-counter(attr(href), page) ']'
}

PDF Bookmarks

PDF bookmarks allow you to navigate your content form the left side bookmark menu as show in the image below

PDF Bookmarks

For each heading level we do the following things for both Antenna House and PrinceXML:

  • Set up the bookmark level
  • Set up whether it’s open or closed
  • Set up the label for the bookmark

Only heading 1, 2 and 3 are set up, level 4, 5 and 6 are only set up as bookmarks only.

section[data-type='chapter'] h1 {
  -ah-bookmark-level: 1;
  -ah-bookmark-state: open;
  -ah-bookmark-label: content();
  prince-bookmark-level: 1;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h2 {
  -ah-bookmark-level: 2;
  -ah-bookmark-state: closed;
  -ah-bookmark-label: content();
  prince-bookmark-level: 2;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h3 {
  -ah-bookmark-level: 3;
  -ah-bookmark-state: closed;
  -ah-bookmark-label: content();
  prince-bookmark-level: 3;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h4 {
  -ah-bookmark-level: 4;
  prince-bookmark-level: 4;
}

section[data-type='chapter'] h5 {
  -ah-bookmark-level: 5;
  prince-bookmark-level: 5;
}

section[data-type='chapter'] h6 {
  -ah-bookmark-level: 6;
  prince-bookmark-level: 6;
}

Running PrinceXML

Once we have the HTML file ready we can run it through PrinceXML to get our PDF using CSS stylesheet for Paged Media we discussed above. The command to run the conversion for a book.html file is:

$ prince --verbose book.html test-book.pdf

Because we added the stylesheet link directly to the HTML document we can skip declaring it in the conversion itself. This is always a cause of errors and frustrations for me so I thought I’d save everyone else the hassle.

XML Wokflows: From XML to PDF: Part 1: Special Transformation

Rather than having to deal with XSL-FO, another XML based vocabulary to create PDF content, we’ll use XSLT to create another HTML file and process it with CSS Paged Media and the companion Generated Content for Paged Media specifications to create PDF content.

I’m not against XSL-FO but the structure of document is not the easiest or most intuitive. An example of XSL-FO looks like this:

<?xml version="1.0" encoding="iso-8859-1"?> (1)

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> (2)
  <fo:layout-master-set> (3)
    <fo:simple-page-master master-name="my-page">
      <fo:region-body margin="1in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

  <fo:page-sequence master-reference="my-page"> (4)
    <fo:flow flow-name="xsl-region-body"> (5)
      <fo:block>Hello, world!</fo:block> (6)
    </fo:flow>
  </fo:page-sequence>
</fo:root>
  1. This is an XML declaration. XSL FO (XSLFO) belongs to XML family, so this is obligatory.
  2. Root element. The obligatory namespace attribute declares the XSL Formatting Objects namespace.
  3. Layout master set. This element contains one or more declarations of page masters and page sequence masters — elements that define layouts of single pages and page sequences. In the example, I have defined a rudimentary page master, with only one area in it. The area should have a 1 inch margin from all sides of the page.
  4. Page sequence. Pages in the document are grouped into sequences; each sequence starts from a new page. Master-reference attribute selects an appropriate layout scheme from masters listed inside <fo:layout -master-set>. Setting master-reference to a page master name means that all pages in this sequence will be formatted using this page master.
  5. Flow. This is the container object for all user text in the document. Everything contained in the flow will be formatted into regions on pages generated inside the page sequence. Flow name links the flow to a specific region on the page (defined in the page master); in our example, it is the body region.
  6. Block. This object roughly corresponds to <div> in HTML, and normally includes a paragraph of text. I need it here, because text cannot be placed directly into a flow.

Rather than define a flow of content and then the content CSS Paged Media uses a combination of new and existing CSS elements to format the content. For example, to define default page size and then add elements to chapter pages looks like this:

@page {
  size: 8.5in 11in;
  margin: 0.5in 1in;
  /* Footnote related attributes */
  counter-reset: footnote;
  @footnote {
    counter-increment: footnote;
    float: bottom;
    column-span: all;
    height: auto;
    }
  }

@page chapter {
  @bottom-center {
    vertical-align: middle;
    text-align: center;
    content: element(heading);
  }
}

The only problem with the code above is that there is no native broser support. For our demonstration we’ll use Prince XML to tanslate our HTML/CSS file to PDF. In the not so distant future we will be able to do this transformation in the browser and print the PDF directly. Until then it’s a two step process: Modifying the HTML we get from the XML file and running the HTML through Prince to get the PDF.

Modifying the HTML results

We’ll use this opportunity to create an xslt customization layer to make changes only to the templates where we need to.

We create a customization layer by importing the original stylesheet and making any necessary changes in the new stylesheet. Imported stylesheets have a lower precedence order than the local version so the local version will win if there is conflict.

Only the templates defined in this stilesheet are overriden. If the template we use is not in this customization layer, the transformation engine will use the template in the base style sheet (book.xsl in this case)

The style sheet is broken by templates and explained below.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="2.0">
  <!-- First import the base stylesheet -->
  <xsl:import href="book.xsl"/>

  <!-- Define the output for this and all document children -->
  <xsl:output name="xhtml-out" method="xhtml"
    indent="yes" encoding="UTF-8" omit-xml-declaration="yes" />

The first difference in the customization layer is that it imports another style sheet (book.xsl). We do this to avoid having to copy the entire style sheet and, if we make changes, having to make the changes in multiple places.

We will then override the templates we need in order to get a single file to pass on to Prince or any other CSS Print Processor.

  <!-- Root template matching book -->
  <xsl:template match="book">
    <html>
      <head>
        <xsl:element name="title">
          <xsl:value-of select="metadata/title"/>
        </xsl:element>
        <!-- Load Typekit Font -->
        <script src="https://use.typekit.net/qcp8nid.js"></script>
        <script>try{Typekit.load();}catch(e){}</script>
        <!-- Paged Media Styles -->
        <link rel="stylesheet" href="css/pm-style.css" />
        <!--
          Load Paged Media definitions just so I won't forget it again
        -->
        <link rel="stylesheet" href="css/paged-media.css"/>
        <!--
              Use highlight.js and style
        -->
        <xsl:if test="(code)">
          <link rel="stylesheet" href="css/styles/railscasts.css" />
          <!-- Load highlight.js -->
          <script src="lib/highlight.pack.js"></script>
          <script>
            hljs.initHighlightingOnLoad();
          </script>
        </xsl:if>
        <!-- <script src="js/script.js"></script> -->
      </head>
      <body>
        <xsl:attribute name="data-type">book</xsl:attribute>
          <xsl:element name="meta">
            <xsl:attribute name="generator">
              <xsl:value-of select="system-property('xsl:product-name')"/>
              <xsl:value-of select="system-property('xsl:product-version')"/>
            </xsl:attribute>
          </xsl:element>
        <xsl:apply-templates select="/" mode="toc"/>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

Most of the root template deals with undoing some of the changes we made to create multiple pages.

We’ve changed the CSS we use to process the content. We use paged-media.css to create the content for our media files, mostly setting up the different pages based on the data-type attribute.

We use pm-styles.css to control the style of our documents specifically for our printed page application. We have to take into account the fact that Highlight.js is not working properly with Prince’s Javascript implementation and that there are places where we don’t want our paragraphs to be indented at all.

We moved elements from the original section templates. We test whether we need to add the Highlight.JS since we dropped the multipage output.

Overriding the section template

Sections are the element type that got the biggest makeover. What we’ve done:

  • Remove filename variable. It’s not needed
  • Remove the result document element since we are building a single file with all our content
  • Change way we check for the type attribute in sections. It will now terminate with an error if the attribute is not found
  • Add the element that will build our running footer (p class=”rh”) and assign the value of the secion’s title to it
  <!-- Override of the section template.-->
  <xsl:template match="section">
    <section>
      <xsl:choose>
        <xsl:when test="string(@type)">
          <xsl:attribute name="data-type">
            <xsl:value-of select="@type"/>
          </xsl:attribute>
        </xsl:when>
        <xsl:otherwise>
          <xsl:message terminate="yes">
            Type attribute is required for paged media. 
            Check your section tags for missing type attributes
          </xsl:message>
        </xsl:otherwise>
      </xsl:choose>
      <xsl:if test="string(@class)">
        <xsl:attribute name="class">
          <xsl:value-of select="@class"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="string(@id)">
        <xsl:attribute name="id">
          <xsl:value-of select="@id"/>
        </xsl:attribute>
      </xsl:if>
      <!-- 
        Running header paragraph.  

        This will be take out of the regular flow of text so 
        it doesn't matter if we add it or not
      -->
      <xsl:element name="p">
        <xsl:attribute name="class">rh</xsl:attribute>
        <xsl:value-of select="title"/>
      </xsl:element> <!-- closses rh class -->
      <xsl:apply-templates/>
    </section>
  </xsl:template>

Metadata

The Metadata section has been reworked into a new section with the title data-type. We set up the container section and assign title to the data-type attribute. We then apply all children templates.

<!-- Metadata -->
<xsl:template match="metadata">
  <xsl:element name="section">
    <xsl:attribute name="data-type">titlepage</xsl:attribute>
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>

Table of contents

The table of content creates anchor links (a href=’#id’) to the title h1 tags we create in the step below. We can do it this way because XSLT guarantees that all calls to generate-id for a given element (in this case the section/title elements) will return the same value for a given execution.

<!-- Create Table of Contents ... work in progress -->
<xsl:template match="toc">
  <section data-type="toc">
    <h1>Table of Contents</h1>
    <nav>
      <ol>
        <xsl:for-each select="//section">
          <xsl:element name="li">
            <xsl:element name="a">
              <xsl:attribute name="href">
                <xsl:value-of select="concat('#', generate-id(.))"/>
              </xsl:attribute>
              <xsl:value-of select="title"/>
            </xsl:element>
          </xsl:element>
        </xsl:for-each>
      </ol>
    </nav>
  </section>
</xsl:template>

Titles

The table of content is commented for now as I work on improving the content and placement of the table contents in the final document.

The title element has only one addition. We add an ID attribute created using XPath’s generate-id function on the parent section element.

  <xsl:template match="title">
    <xsl:element name="h1">
      <xsl:attribute name="id">
        <xsl:value-of select="generate-id(..)"/>
      </xsl:attribute>
      <xsl:if test="string(@align)">
        <xsl:attribute name="align">
          <xsl:value-of select="@align"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="string(@class)">
        <xsl:attribute name="class">
          <xsl:value-of select="@class"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:value-of select="."/>
    </xsl:element> <!-- closes h1 -->
  </xsl:template>
</xsl:stylesheet>

With all this in place we can now look to the CSS Paged Media file.

XML workflows: Converting our content to HTML

Converting our content to HTML

One of the biggest advantages of working with XML is that we can convert the abstract tags into other markups. For the purposes of this project we’ll convert the XML created to match the schema we just created to HTML and then use tools like PrinceXML or AntenaHouse we’ll convert the HTML/CSS files to PDF

Why HTML

HTML is the default format for the web and for most web/html based content such as ePub and Kindle. As such it makes a perfect candidate to explore how to generate it programmatically from a single source file.

HTML will also act as our source for using CSS paged media to create PDF content.

Why PDF

Rather than having to deal with XSL-FO, another XML based vocabulary to create PDF content, we’ll use XSLT to create another HTML file and process it with CSS Paged Media and the companion Generated Content for Paged Media specifications to create PDF content.

Where there is a direct equivalent between our model and the HTML5.1 nightly specification I’ve quoted the relevant section of the HTML5 spec as a reference and as a rationale of why I’ve done things the way I did.

In this document we’ll concentrate on the XSLT to HTML conversion and will defer converting HTML to PDF to a later article.

Creating our conversion style sheets

To convert our XML into other formats we will use XSL Transformations (also known as XSLT) version 2 (a W3C standard) and version 3 (a W3C last call draft recommendation) where appropriate.

XSLT is a functional language designed to transform XML into other markup vocabularies. It defines template rules that match elements in your source document and processing them to convert them to the target vocabulary.

In the XSLT template below, we do the following:

  1. Declare the file as an XML document
  2. Define the root element of the style sheet (xsl:stylesheet)
  3. Indicate the namespaces that we’ll use in the document and, in this case, tell the processor to exclude the given namespace
  4. Strip whitespace from all elements and keep it in the code elements
  5. Create the default output we’ll use for the main document and all generated pages (discussed later)
  6. Create a default template to warn us if we missed anything

[code lang=xml]
<?xml version="1.0" ?>
<!– Define stylesheet root and namespaces we'll work with –>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:epub="http://www.idpf.org/2007/opf"
exclude-result-prefixes="dc epub"
xml:lang="en-US"
version="2.0">
<!– Strip whitespace from the listed elements –>
<xsl:strip-space elements="*"/>
<!– And preserve it from the elements below –>
<xsl:preserve-space elements="code"/>
<!– Define the output for this and all document children –>
<xsl:output name="xhtml-out" method="xhtml" indent="yes" encoding="UTF-8" omit-xml-declaration="yes" />

<!–
Default template taken from http://bit.ly/1sXqIL8

This will tell us of any unmatched elements rather than
failing silently
–>
<xsl:template match="*">
<xsl:message terminate="no">
WARNING: Unmatched element: <xsl:value-of select="name()"/>
</xsl:message>

<xsl:apply-templates/>
</xsl:template>

<!– More content to be added –>
</xsl:stylesheet>
[/code]

This is a lot of work before we start creating our XSLT content. But it’s worth doing the work up front. We’ll see what are the advantages of doing it this way as we move down the style sheet.

Now to our root templates. The first one is the entry point to our document. It performs the following tasks:

  1. Match the root element to create the skeleton for our HTML content
  2. In the title we insert the value of the metadata/title element
  3. In the body we ‘apply’ the templates that match the content inside our document (more on this later)

[code lang=xml]
<!– Root template, matching / –>
<xsl:template match="book">
<html>
<head>
<xsl:element name="title">
<xsl:value-of select="metadata/title"/>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="generator">
<xsl:value-of select="system-property('xsl:product-name')"/>
<xsl:value-of select="system-property('xsl:product-version')"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="vendor">
<xsl:value-of select="system-property('xsl:vendor-url')" />
</xsl:attribute>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="vendor-URL">
<xsl:value-of select="system-property('xsl:vendor-url')" />
</xsl:attribute>
</xsl:element>
<link rel="stylesheet" href="css/style.css" />
<xsl:if test="(code)">
<!–
Use highlight.js and docco style
–>
<link rel="stylesheet" href="css/styles/docco.css" />
<!– Load highlight.js –>
<script src="lib/highlight.pack.js"></script>
<script>
hljs.initHighlightingOnLoad();
</script>
</xsl:if>
<!–
Comment this out for now. It'll become relevant when we add video
<script src="js/script.js"></script>
–>
</head>
<body>
<xsl:apply-templates/>
<xsl:apply-templates select="/" mode="toc"/>
</body>
</html>
</xsl:template>
[/code]

We could build the CSS style sheet and JavaScript files as part of our root template but we chose not to.

Working with the style sheet as part of the XSLT style sheet allows the XSLT stylesheet designer to embed the style and parametrize the stylesheet, thus making the stylesheet customizable from the command line.

For all advantages, this method ties the styles for the project to the XSLT stylesheet and requires the XSLT stylesheet designer to remain involved in all CSS and JavaScript updates.

By linking to external CSS and JavaScript files we can leverage expertise independent of the Schema and XSLT style sheets. Book designers can work on the CSS, UX and experience designers can work on JavaScript and other CSS areas, book designers can work on the Paged Media style sheets and authors can just write.

Furthermore we can reuse our CSS and JavaScript on multiple documents.

Table of contents

The table of content template is under active development and will be different depending on the desired output. I document it here as it is right now but will definitely change as it’s further developed.

There is a second template matching the root element of our document to create a table of content. At first thought this looks like the wrong approach

We leverage XSLT modes that allow us to create templates for the same element to perform different tasks. In toc mode we want the root template to do the following:

  1. Create the section and nav and ol elements
  2. Add the title for the table of contents
  3. For each section element that is a child of root create these elements
    1. The li element
    2. The a element with the corresponding href element
    3. The value of the href element (a concatenation of the section’s type attribute, the position within the document and the .html string)
    4. The title of the section as the ‘clickable’ portion of the link

[code lang=xml]
<xsl:template match="/" mode="toc">
<section data-type="toc"> (1)
<nav class="toc"> (1)
<h2>Table of Contents</h2>
<ol>
<xsl:for-each select="book/section">
<xsl:element name="li"> (3.1)
<xsl:element name="a"> (3.2)
<xsl:attribute name="href"> (3.2)
<xsl:value-of select="concat((@type), position(),'.html')"/> (3.3)
</xsl:attribute>
<xsl:value-of select="title"/> (3.4)
</xsl:element>
</xsl:element>
</xsl:for-each>
</ol>
</nav>
</section>
</xsl:template>
[/code]

Metadata and Section

With these templates in place we can now start writing the major areas of the document, metadata and section.

Metadata

The metadata is a container for all the elements inside. As such we just create the div that will hold the content and call xsl:apply-templates to process the children inside the metadata element using the apply-template XSLT instruction. The template looks like this

[code lang=xml]
<xsl:template match="metadata">
<xsl:element name="div">
<xsl:attribute name="class">metadata</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

Section

The section template, on the other hand, is more complex because it has a lot of work to do. It is our primary unit for generating content fifiles, takes most of the same attributes as the root template and then processes the rest of the content.

Inside the template we first create a vairable to hold the name of the file we’ll generate. The file name is a concatenation of the following elements:

  • The type attribute
  • The position in the document
  • the string “.html”

The result-document element takes two parameters: the value of the file name variable we just defined and the xhtml-out format we defined at the top of the document. The XHTML format may look like overkill right now but it makes sense when we consider moving the generated content to ePub or other fomats where strict XHTML conformance is a requirement.

We start generating the skeleton of the page, we add the default style sheet and do the first conditional test of the document. Don’t want to add stylesheets to the page unless they are needed so we test if there is a code element on the page and only add highlight.js related stylesheets and scripts.

In the body element we add a section element, the main wrapper for our content.

For the section we conditionally add attributes to the element. We use only add a data-type attribute to body if there is a type attribute in the source document. We do the same thing for id and class.

[code lang=xml]
<xsl:template match="section">
<!– Variable to create section file names –>
<xsl:variable name="fileName" select="concat((@type), (position()-1),'.html')"/>
<!– An example result of the variable above would be introduction1.xhtml –>
<xsl:result-document href='{$fileName}' format="xhtml-out">
<html>
<head>
<link rel="stylesheet" href="css/style.css" />
<xsl:if test="(code)">
<!–
Use highlight.js and github style
–>
<link rel="stylesheet" href="css/styles/docco.css" />
<!– Load highlight.js –>
<script src="lib/highlight.pack.js"></script>
<script>
hljs.initHighlightingOnLoad();
</script>
</xsl:if>
</head>
<body>
<section>
<xsl:if test="string(@type)">
<xsl:attribute name="data-type">
<xsl:value-of select="@type"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</section>
</body>
</html>
</xsl:result-document>
</xsl:template>
[/code]

Metadata content

We process the content of the metadata separate than the structure. We take our primary metadata elements, ISBN and edition and wrap a paragraph tag around them. We can later reuse the element or change its appearance using CSS.

[code lang=xml]
<xsl:template match="isbn">
<p>ISBN: <xsl:value-of select="."/></p>
</xsl:template>

<xsl:template match="edition">
<p>Edition: <xsl:value-of select="."/></p>
</xsl:template>
[/code]

For authors we do the following:

  1. For each individual in the group we take the first name and the surname
  2. Wrap the name around an li element to build an unnumbered list. We can style the list with CSS later

For editors and other roles we do the same thing

  1. For each individual in the group we take the first name and the surname
  2. We concatenate the type/role to create a full title (production editor for example)
  3. Wrap the name and the title with an li element that we can style with CSS later

[code lang=xml]
<xsl:template match="metadata/authors">
<h2>Authors</h2>
<ul>
<xsl:for-each select="author">
<li>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="surname"/>
</li>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template match="metadata/editors">
<h2>Editorial Team</h2>
<ul class="no-bullet">
<xsl:for-each select="editor">
<li>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="surname"/>
<xsl:value-of select="concat(' – ', type, ' ', 'editor')"></xsl:value-of>
</li>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template match="metadata/otherRoles">
<h2>Production team</h2>
<ul class="no-bullet">
<xsl:for-each select="otherRole">
<li>
<xsl:value-of select="first-name" />
<xsl:text> </xsl:text>
<xsl:value-of select="surname" />
<xsl:text> – </xsl:text>
<xsl:value-of select="role" />
</li>
</xsl:for-each>
</ul>
</xsl:template>
[/code]

Titles and headings

Headings are primarily used to create sections of content. We use the same heading levels as HTML with the addition of a title tag that also maps to a level 1 heading. We’ve put title and h1 in separate templates to make it possible and easier to generate different code for each heading.

Working with XSLT is not the same as using CSS where you can declare rules for the same attribute multiple times (with the last one winning); when writing transformations you can only have one per element otherwise you will get an error (there are exceptions to the rule but let’s not worry about that just yet.)

According to the spec:

These elements [h1 to h6] represent headings for their sections.

The semantics and meaning of these elements are defined in the section on headings and sections.

These elements have a rank given by the number in their name. The h1 element is said to have the highest rank, the h6 element has the lowest rank, and two elements with the same name have equal rank.

h1–h6 elements must not be used to markup subheadings, subtitles, alternative titles and taglines unless intended to be the heading for a new section or subsection. Instead use the markup patterns in the Common idioms without dedicated elements section of the specification.

– 4.3.6 The h1, h2, h3, h4, h5, and h6 elements, Berjon et al. 2013

All elements have the same attribute set: align, class and id.

The remaining headings, h2 through h6, all have the same attributes and the templates are structured the same way.

[code lang=xml]
<xsl:template match="title ">
<xsl:element name="h1">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>

<xsl:template match="h1">
<xsl:element name="h1">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Blockquotes, quotes and asides

Blockquotes, asides and quotes provide sidebar-like content on our document. According to the W3C:

The blockquote element represents content that is quoted from another source, optionally with a citation which must be within a footer or cite element, and optionally with in-line changes such as annotations and abbreviations.
Content inside a blockquote other than citations and in-line changes must be quoted from another source, whose address, if it has one, may be cited in the cite attribute. [emphasis mine]
– 4.51 the Blockquote element , Berjon et al. 2013

The cite HTML provides attribution to the blockquote it is used in. To prevent confusion and to make it’s meaning clear the document model uses the attribution tag instead, their purpose is identical and during the transformation the attribution will become a cite element. According to spec:

The cite element represents a reference to a creative work. It must include the title of the work or the name of the author (person, people or organization) or an URL reference, which may be in an abbreviated form as per the conventions used for the addition of citation metadata. [emphasis mine]

– 4.51 the Cite element , Berjon et al. 2013

[code lang=xml]
<xsl:template match="blockquote">
<xsl:element name="blockquote">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates />
</xsl:element>
</xsl:template>

<!– BLOCKQUOTE ATTRIBUTION–>
<xsl:template match="attribution">
<xsl:element name="cite">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

The q element is the inline equivalent to blockquote and has been replaced in our markup by the quote element. As stated in the HTML5 specification:

The q element represents some phrasing content quoted from another source.

Quotation punctuation (such as quotation marks) that is quoting the contents of the element must not appear immediately before, after, or inside q elements; they will be inserted into the rendering by the user agent.

Content inside a q element must be quoted from another source, whose address, if it has one, may be cited in the cite attribute. The source may be fictional, as when quoting characters in a novel or screenplay.

If the cite attribute is present, it must be a valid URL potentially surrounded by spaces. To obtain the corresponding citation link, the value of the attribute must be resolved relative to the element. User agents may allow users to follow such citation links, but they are primarily intended for private use (e.g. by server-side scripts collecting statistics about a site’s use of quotations), not for readers.

The q element must not be used in place of quotation marks that do not represent quotes; for example, it is inappropriate to use the q element for marking up sarcastic statements.

The use of q elements to mark up quotations is entirely optional; using explicit quotation punctuation without q elements is just as correct.

– 4.5.7 The q element, Berjon et al. 2013

[code lang=xml]
<xs:element name="quote">
<xs:complexType mixed="true">
<xs:attribute name="cite" type="xs:anyURI" use="optional"/>
<xs:attributeGroup ref="genericPropertiesGroup"/>
</xs:complexType>
</xs:element>
[/code]

Asides are primarily used fpr content realted to the main flow of the document. I use it mostly for notes indirectly related to the main content, for example, to explain that there are other ways to generate Schemas apart from W3C’s Schema. It is good to know this but it won’t change the information in the main content flow.

Per Spec:

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.

The element can be used for typographical effects like pull quotes or sidebars, for advertising, for groups of nav elements, and for other content that is considered separate from the main content of the page.

– 4.3.5 The aside element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="aside">
<aside>
<xsl:if test="type">
<xsl:attribute name="data-type">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</aside>
</xsl:template>
[/code]

Div and Span

div and span elements are neutral, they don’t have meaning on their onw but they can get their meaning from attributes such as class, data-*, and id. Divs are meant as block level elements and siblings or children to sections where span is used inline, like a child to our para elements.

According to the specification:

The div element has no special meaning at all. It represents its children. It can be used with the class, lang, and title attributes to mark up semantics common to a group of consecutive elements.

Authors are strongly encouraged to view the div element as an element of last resort, for when no other element is suitable. Use of more appropriate elements instead of the div element leads to better accessibility for readers and easier maintainability for authors.

– 4.4.14 The div element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="div">
<xsl:element name="div">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

The span element on its own is meaningless. We can give it meaning with the attributes we pass to it. We can give it a class or id for CSS styling or a type to generate semantic meaning for the contained text.

When we start working on an ePub implementation we can also add the epub:type attribute to create an even more detailed semantic map of our content.

The span element doesn’t mean anything on its own, but can be useful when used together with the global attributes, e.g. class, lang, or dir. It represents its children.

– 4.5.28 The span element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="span">
<xsl:element name="span">
<xsl:if test="@type">
<xsl:attribute name="data-type">
<xsl:value-of select="@type"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Paragraphs

The paragraph is our basic unit of content. Paragraphs are usually represented as blocks of text but they can be styled anyway we choose with the proper CSS.

[code lang=xml]
<xsl:template match="para">
<xsl:element name="p">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

Styles

Styles are used to indicate typograhical styles such as strong, emphsis, strikethrough and underline.

[code lang=xml]
<xsl:template match="strong">
<strong><xsl:apply-templates /></strong>
</xsl:template>
[/code]

Since we’re working with print and visual media only we use only strong to indicate bold elements. I’ve never understood how do strong and b work in screen and printed pages or in screen displays.

May have to implement b when developing the accessibility component for the document schema.

[code lang=xml]
<xsl:template match="emphasis">
<em><xsl:apply-templates/></em>
</xsl:template>
[/code]

As with strong, I’ve decided to only use emphasis to indicate italics and save i for a future revision when, and if, it becomes necessary

[code lang=xml]
<xsl:template match="strike">
<strike><xsl:apply-templates/></strike>
</xsl:template>
[/code]

Although the strikethrough element has been deprecated in the HTML5 standard, it’s still worth having as it can also be the target for CSS that accomplishes the same goal.

The CSS way is to assign a text-decoration: line-through instruction to the strike selector.

[code lang=xml]
<xsl:template match="underline">
<u><xsl:apply-templates/></u>
</xsl:template>
[/code]

While there is a u element it has different semantics than underline. Like strike the correct way to do it is with CSS; in this case using text-decoration: underlike for the chosen element.

Links and anchors

Links are the essence of the web. They allow you to navigate within the document you’re in or move to external documents. I’ve taken shortcuts and made the label attribute (used for accessibility) and the content of the link the same text. This reduced the ammount of typing we have to do but run the risk of becoming too inflexible.

[code lang=xml]
<xsl:template match="link">
<xsl:element name="a">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:attribute name="href">
<xsl:value-of select="@href"/>
</xsl:attribute>
<xsl:attribute name="label">
<xsl:value-of select="@label"/>
</xsl:attribute>
<xsl:value-of select="@label"/>
</xsl:element>
</xsl:template>
[/code]

When working with links there are times when we want to link to sections within the same document or to specific sections in another document or to specific sections inside a paragraph or to a figure. To do this we need anchors that will resolve to the following HTML:

[code lang=html]
<a id="#target"><a>
[/code]

The transformation element looks like this:

[code lang=xml]
<xsl:template match="anchor">
<xsl:element name="a">
<xsl:attribute name="id">
</xsl:attribute>
</xsl:element>
</xsl:template>
[/code]

Not sure if I want to make this an empty element or not

Most if not all the elements in our document model can use the id attribute so there is no realy need for the anchor element.

However the empty element <anchor id=”home”/> appeals to my ease of use paradigm but it may not be as easy to understand for peope who are not familiar with XML empty elements

Code blocks

Code elements create fenced code blocks like the ones from Github Flavored Markdown.

We use Adobe Source Code Pro font. It’s a clean and readable font designed specifically for source code display.

We highlight our code with Highlight.js. This makes the class attribute mandatory as we need it to tell highlight.js what syntax library to use

[code lang=xml]
<xsl:template match="code">
<xsl:element name="pre">
<xsl:element name="code">
<xsl:attribute name="class">
<xsl:value-of select="@language"/>
</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:element>
</xsl:template>
[/code]

Lists and list items

When I first conceptualized this project I had designed a single list element and attributes to produce bulleted and numbered lists. This proved to difficult to implement so I went back to two separate elements: ulist for bulleted lists and olist for numbered lists.

Both elements share the item element to indicates the items inside the list. At least one item is required a list.

[code lang=xml]
<xsl:template match="ulist">
<xsl:element name="ul">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="olist">
<xsl:element name="ol">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="item">
<xsl:element name="li">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Figures and Images

The figure element represents some flow content, optionally with a caption, that is self-contained (like a complete sentence) and is typically referenced as a single unit from the main flow of the document.

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc.

A figure element’s contents are part of the surrounding flow. If the purpose of the page is to display the figure, for example a photograph on an image sharing site, the figure and figcaption elements can be used to explicitly provide a caption for that figure. For content that is only tangentially related, or that serves a separate purpose than the surrounding flow, the aside element should be used (and can itself wrap a figure). For example, a pull quote that repeats content from an article would be more appropriate in an aside than in a figure, because it isn’t part of the content, it’s a repetition of the content for the purposes of enticing readers or highlighting key topics.

– 4.4.11 The figure element, Berjon et al. 2013

Figures, captions and the images inside present a few challenges. Because we allow authors to set height and width on both figure and the image inside we may find situations where the figure container is narrower than the image inside.

To avoid this issue we test whether the figure width value is smaller than the width of the image inside. If it is, we use the width of the image as the width of the figure, otherwise we use the width of the image inside.

We do the same thing for height in order to avoid squished images of captions that draw over the image because it’s too small. If the height of the figure is smaller than the height of the image we use the height of the image, otherwise we use the height of the figure element.

For both height and width we concatenate the attribute value with the string ‘px’ to make sure that it works in both straight CSS and with Prince and other CSS PDF generators

Alignments can be different, it is possible to have a right-aligned image to live inside a centered container.

The data model for our content allows both figures and images to be used in the document. This is so we don’t have to insert empty captions to figures just so we can add an image. If we don’t want a caption we can insert the image directly on our document.

Contrary to the HTML specification we use figure only to display images. We have a specialized template to address code blocks for program listings and can create additional elements

[code lang=xml]
<xsl:template match="figure">
<xsl:element name="figure">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:choose>
<xsl:when test="string(width) and (@width lt image/@width)">
<xsl:attribute name="width">
<xsl:value-of select="@width"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="width">
<xsl:value-of select="image/@width"/>
</xsl:attribute>
</xsl:otherwise>
</xsl:choose>
<xsl:choose>
<xsl:when test="string(@height) and (@height lt image/@height)">
<xsl:attribute name="width">
<xsl:value-of select="@height"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="width">
<xsl:value-of select="image/@height"/>
</xsl:attribute>
</xsl:otherwise>
</xsl:choose>
<xsl:if test="(@align)">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="image"/>
<xsl:apply-templates select="figcaption"/>
</xsl:element>
</xsl:template>

<xsl:template match="figcaption">
<figcaption><xsl:apply-templates/></figcaption>
</xsl:template>

<xsl:template match="image">
<xsl:element name="img">
<xsl:attribute name="src">
<xsl:value-of select="@src"/>
</xsl:attribute>
<xsl:attribute name="alt">
<xsl:value-of select="@alt"/>
</xsl:attribute>
<xsl:if test="(@width)">
<xsl:attribute name="width">
<xsl:value-of select="@width"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@height)">
<xsl:attribute name="height">
<xsl:value-of select="@height"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@align)">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
</xsl:element>
</xsl:template>
[/code]

XML workflows: Introduction

One of the biggest limitations of markup languages, in my opinion, is how confining they are. Even large vocabularies like Docbook have limited functionality out of the box. HTML4 is non-extensible and HTML5 limits how you can extend it (web components are the only way to extend HTML5 I’m aware of that doesn’t need an update to the HTML specification.)

By creating our own markup vocabulary we can be as expressive as we need without adding complexity for writers and users and without adding unnecessary complexity for the developers building the tools to interact with the markup.

Why create our own markup

I have a few answers to that question:

In creating your own xml-based markup you enforce separation of content and style. The XML document provides the basic content of the document and the hints to use elsewhere. XSLT style sheets allow you to structure the base document and associated hints into any several formats (for the purposes of this document we’ll concentrate on XHTML, PDF created through Paged Media CSS and PDF created using XSL formatting Objects)

Creating a domain specific markup vocabulary allows you think about structure and complexity for yourself as the editor/typesetter and for your authors. It makes you think about elements and attributes and which one is better for the given experience you want and what, if any, restrictions you want to impose on your makeup.

By creating our own vocabulary we make it easier for authors to write clean and simple content. XML provides a host of validation tools to enforce the structure and format of the XML document.

Options for defining the markup

For the purpose of this project we’ll define a set of resources that work with a book structure like the one below:

[code lang=xml]
<book>
<metadata>
<title>The adventures of Sherlock Holmes</title>
<author>
<first-name>Arthur</first-name>
<surname>Conan Doyle</surname>
</author>
</metadata>
<section type="chapter">
<para>Lorem Ipsum</para>
<para>Lorem Ipsum</para>
</section>
</book>
[/code]

It is not a complete structure. We will continue adding elements after we reach the MVP (Minimum Viable Product) stage. As usual, feedback is always appreciated.