XML Wokflows: From XML to PDF: Part 2: CSS

With the HTML ready, we can no look at the CSS stylesheet to process it into PDF.

The extensions, pseudo elements and attributes we use are all part of the CSS Paged Media or Generated Content for Paged Media specifications. Where appropriate I’ve translated them to work on both PDF and HTML.

Book defaults

The first step in creating the default structure for the book using @page at-element.

Our base definition does the following:

  1. Size the page to letter (8.5 by 11 inches), width first
  2. Use CSS notation for margins. In this case the top and bottom margin are 0.5 inches and left and right are 1 inch
  3. Reset the footnote counter.
  4. Using the @footnote attribute do the following
    1. Increment the footnote counter
    2. Place footnote at the bottom using another value for the float attribute
    3. Span all columns
    4. Make the height as tall as necessary
/* STEP 1: DEFINE THE DEFAULT PAGE */
@page {
  size: 8.5in 11in; (1)
  margin: 0.5in 1in; (2)
  /* Footnote related attributes */
  counter-reset: footnote; (3)
  @footnote {
    counter-increment: footnote; (4.1)
    float: bottom; (4.2)
    column-span: all; (4.3)
    height: auto; (4.4)
    }
  }

In later sections we’ll create named page templates and associate them to different portions of our written content.

Page counters

We define two conditions under which we reset the page counter: When we have a book followed by a part and when we have a book followed by the a first chapter.

We do not reset the content when the path if from book to chapter to part.

body[data-type='book'] > div[data-type='part']:first-of-type,
body[data-type='book'] > section[data-type='chapter']:first-of-type { counter-reset: page; }
body[data-type='book'] > section[data-type='chapter']+div[data-type='part'] { counter-reset: none }

Matching content sections to page types

The next section of the style sheet is to match the content on our book to pages in our style sheet.

The book is broken into sections with data-type attributes to indicate the type of content; we match the section[data-type] element to a page type along with some basic style definitions.

We will further define the types of pages later in the style sheet.

/* Title Page*/
section[data-type='titlepage'] { page: titlepage }

/* Copyright page */
section[data-type='copyright'] { page: copyright }

/* Dedication */
section[data-type='dedication'] {
  page: dedication;
  page-break-before: always;
}

/* TOC */
section[data-type='toc'] {
  page: toc;
  page-break-before: always;
}
/* Leader for toc page */
section[data-type='toc'] nav ol li a:after {
  content: leader(dotted) ' ' target-counter(attr(href, url), page);
}

/* Foreword  */
section[data-type='foreword'] { page: foreword }

/* Preface*/
section[data-type='preface'] { page: preface }

/* Part */
div[data-type='part'] { page: part }

/* Chapter */
section[data-type='chapter'] {
  page: chapter;
  page-break-before: always;
}

/* Appendix */
section[data-type='appendix'] {
  page: appendix;
  page-break-before: always;
}

/* Glossary*/
section[data-type='glossary'] { page: glossary }

/* Bibliography */
section[data-type='bibliography'] { page: bibliography }

/* Index */
section[data-type='index'] { page: index }

/* Colophon */
section[data-type='colophon'] { page: colophon }

Front matter formatting

For each page of front matter contnt (toc, foreword and preface) we define two pages: left and right. We do it this way to acommodate facing pages with numbers on ooposite sides (for two sided printout)

For the front matter we chose to use Roman numerals on the bottom of the page

/* Comon Front Mater Page Numbering in lowercase ROMAN numerals*/
@page toc:right {
  @bottom-right-corner { content: counter(page, lower-roman) }
  @bottom-left-corner { content: normal }
}

@page toc:left  {
  @bottom-left-corner { content: counter(page, lower-roman) }
  @bottom-right-corner { content: normal }
}


@page foreword:right {
  @bottom-center { content: counter(page, lower-roman) }
  @bottom-left-corner { content: normal }
}

@page foreword:left  {
  @bottom-left-corner { content: counter(page, lower-roman) }
  @bottom-right-corner { content: normal }
}


@page preface:right {
  @bottom-center {content: counter(page, lower-roman)}
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

@page preface:left  {
  @bottom-center {content: counter(page, lower-roman)}
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

Pages formatting

We use the same system we used in the front matter to do a few things with our content.

We first remove page numbering from the title page and dedication by setting the numbering on both bottom corners to normal.

/* Common Content Page Numbering  in Arabic numerals 1... 199 */
@page titlepage{ /* Need this to clean up page numbers in titlepage in Prince*/
  margin-top: 18em;
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }
}

@page dedication { /* Need this to clean up page numbers in titlepage in Prince*/
  page-break-before: always;
  margin-top: 18em;
  @bottom-right-corner { content: normal }
  @bottom-left-corner { content: normal }

}

Now we start working on our chapter pages. The first thing we do is to place our running header content in the bottom middle of the page, regardless of whether it’s left or right.

@page chapter {
  @bottom-center {
    vertical-align: middle;
    text-align: center;
    content: element(heading);
  }
}

We next setup a blank page for our chapters and tell the reader that the page was intentionally left blank to prevent confusion

@page chapter:blank { /* Need this to clean up page numbers in titlepage in Prince*/
  @top-center { content: "This page is intentionally left blank" }
  @bottom-left-corner { content: normal;}
  @bottom-right-corner {content:normal;}
}

Then we number the pages the same way that we did for our front matter except that we use narabic numerals instead of Roman. 

```css
@page chapter:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page chapter:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page appendix:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page appendix:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page glossary:right,  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page glossary:left, {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page bibliography:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page bibliography:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

@page index:right  {
  @bottom-right-corner { content: counter(page) }
  @bottom-left-corner { content: normal }
}

@page index:left {
  @bottom-left-corner { content: counter(page) }
  @bottom-right-corner { content: normal }
}

Running footer

We now style the running footer.

p.rh {
  position: running(heading);
  text-align: center;
  font-style: italic;
}

Footnotes and cross references

Footnotes are tricky, they consist of two parts, the footnote-call and the footnote content itself. I’m still trying to figure out what the correct markup should be for marking up footnotes.

We’ve also defined a special class of links that appends a string and the the destination’s page number.

/* Footnotes */
span.footnote {
  float: footnote;
}

::footnote-marker {
  content: counter(footnote);
  list-style-position: inside;
}

::footnote-marker::after {
  content: '. ';
}

::footnote-call {
  content: counter(footnote);
  vertical-align: super;
  font-size: 65%;
}

/* XReferences */
a.xref[href]::after {
    content: ' [See page ' target-counter(attr(href), page) ']'
}

PDF Bookmarks

PDF bookmarks allow you to navigate your content form the left side bookmark menu as show in the image below

PDF Bookmarks

For each heading level we do the following things for both Antenna House and PrinceXML:

  • Set up the bookmark level
  • Set up whether it’s open or closed
  • Set up the label for the bookmark

Only heading 1, 2 and 3 are set up, level 4, 5 and 6 are only set up as bookmarks only.

section[data-type='chapter'] h1 {
  -ah-bookmark-level: 1;
  -ah-bookmark-state: open;
  -ah-bookmark-label: content();
  prince-bookmark-level: 1;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h2 {
  -ah-bookmark-level: 2;
  -ah-bookmark-state: closed;
  -ah-bookmark-label: content();
  prince-bookmark-level: 2;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h3 {
  -ah-bookmark-level: 3;
  -ah-bookmark-state: closed;
  -ah-bookmark-label: content();
  prince-bookmark-level: 3;
  prince-bookmark-state: closed;
  prince-bookmark-label: content();
}

section[data-type='chapter'] h4 {
  -ah-bookmark-level: 4;
  prince-bookmark-level: 4;
}

section[data-type='chapter'] h5 {
  -ah-bookmark-level: 5;
  prince-bookmark-level: 5;
}

section[data-type='chapter'] h6 {
  -ah-bookmark-level: 6;
  prince-bookmark-level: 6;
}

Running PrinceXML

Once we have the HTML file ready we can run it through PrinceXML to get our PDF using CSS stylesheet for Paged Media we discussed above. The command to run the conversion for a book.html file is:

$ prince --verbose book.html test-book.pdf

Because we added the stylesheet link directly to the HTML document we can skip declaring it in the conversion itself. This is always a cause of errors and frustrations for me so I thought I’d save everyone else the hassle.

XML Wokflows: From XML to PDF: Part 1: Special Transformation

Rather than having to deal with XSL-FO, another XML based vocabulary to create PDF content, we’ll use XSLT to create another HTML file and process it with CSS Paged Media and the companion Generated Content for Paged Media specifications to create PDF content.

I’m not against XSL-FO but the structure of document is not the easiest or most intuitive. An example of XSL-FO looks like this:

<?xml version="1.0" encoding="iso-8859-1"?> (1)

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> (2)
  <fo:layout-master-set> (3)
    <fo:simple-page-master master-name="my-page">
      <fo:region-body margin="1in"/>
    </fo:simple-page-master>
  </fo:layout-master-set>

  <fo:page-sequence master-reference="my-page"> (4)
    <fo:flow flow-name="xsl-region-body"> (5)
      <fo:block>Hello, world!</fo:block> (6)
    </fo:flow>
  </fo:page-sequence>
</fo:root>
  1. This is an XML declaration. XSL FO (XSLFO) belongs to XML family, so this is obligatory.
  2. Root element. The obligatory namespace attribute declares the XSL Formatting Objects namespace.
  3. Layout master set. This element contains one or more declarations of page masters and page sequence masters — elements that define layouts of single pages and page sequences. In the example, I have defined a rudimentary page master, with only one area in it. The area should have a 1 inch margin from all sides of the page.
  4. Page sequence. Pages in the document are grouped into sequences; each sequence starts from a new page. Master-reference attribute selects an appropriate layout scheme from masters listed inside <fo:layout -master-set>. Setting master-reference to a page master name means that all pages in this sequence will be formatted using this page master.
  5. Flow. This is the container object for all user text in the document. Everything contained in the flow will be formatted into regions on pages generated inside the page sequence. Flow name links the flow to a specific region on the page (defined in the page master); in our example, it is the body region.
  6. Block. This object roughly corresponds to <div> in HTML, and normally includes a paragraph of text. I need it here, because text cannot be placed directly into a flow.

Rather than define a flow of content and then the content CSS Paged Media uses a combination of new and existing CSS elements to format the content. For example, to define default page size and then add elements to chapter pages looks like this:

@page {
  size: 8.5in 11in;
  margin: 0.5in 1in;
  /* Footnote related attributes */
  counter-reset: footnote;
  @footnote {
    counter-increment: footnote;
    float: bottom;
    column-span: all;
    height: auto;
    }
  }

@page chapter {
  @bottom-center {
    vertical-align: middle;
    text-align: center;
    content: element(heading);
  }
}

The only problem with the code above is that there is no native broser support. For our demonstration we’ll use Prince XML to tanslate our HTML/CSS file to PDF. In the not so distant future we will be able to do this transformation in the browser and print the PDF directly. Until then it’s a two step process: Modifying the HTML we get from the XML file and running the HTML through Prince to get the PDF.

Modifying the HTML results

We’ll use this opportunity to create an xslt customization layer to make changes only to the templates where we need to.

We create a customization layer by importing the original stylesheet and making any necessary changes in the new stylesheet. Imported stylesheets have a lower precedence order than the local version so the local version will win if there is conflict.

Only the templates defined in this stilesheet are overriden. If the template we use is not in this customization layer, the transformation engine will use the template in the base style sheet (book.xsl in this case)

The style sheet is broken by templates and explained below.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs"
  version="2.0">
  <!-- First import the base stylesheet -->
  <xsl:import href="book.xsl"/>

  <!-- Define the output for this and all document children -->
  <xsl:output name="xhtml-out" method="xhtml"
    indent="yes" encoding="UTF-8" omit-xml-declaration="yes" />

The first difference in the customization layer is that it imports another style sheet (book.xsl). We do this to avoid having to copy the entire style sheet and, if we make changes, having to make the changes in multiple places.

We will then override the templates we need in order to get a single file to pass on to Prince or any other CSS Print Processor.

  <!-- Root template matching book -->
  <xsl:template match="book">
    <html>
      <head>
        <xsl:element name="title">
          <xsl:value-of select="metadata/title"/>
        </xsl:element>
        <!-- Load Typekit Font -->
        <script src="https://use.typekit.net/qcp8nid.js"></script>
        <script>try{Typekit.load();}catch(e){}</script>
        <!-- Paged Media Styles -->
        <link rel="stylesheet" href="css/pm-style.css" />
        <!--
          Load Paged Media definitions just so I won't forget it again
        -->
        <link rel="stylesheet" href="css/paged-media.css"/>
        <!--
              Use highlight.js and style
        -->
        <xsl:if test="(code)">
          <link rel="stylesheet" href="css/styles/railscasts.css" />
          <!-- Load highlight.js -->
          <script src="lib/highlight.pack.js"></script>
          <script>
            hljs.initHighlightingOnLoad();
          </script>
        </xsl:if>
        <!-- <script src="js/script.js"></script> -->
      </head>
      <body>
        <xsl:attribute name="data-type">book</xsl:attribute>
          <xsl:element name="meta">
            <xsl:attribute name="generator">
              <xsl:value-of select="system-property('xsl:product-name')"/>
              <xsl:value-of select="system-property('xsl:product-version')"/>
            </xsl:attribute>
          </xsl:element>
        <xsl:apply-templates select="/" mode="toc"/>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

Most of the root template deals with undoing some of the changes we made to create multiple pages.

We’ve changed the CSS we use to process the content. We use paged-media.css to create the content for our media files, mostly setting up the different pages based on the data-type attribute.

We use pm-styles.css to control the style of our documents specifically for our printed page application. We have to take into account the fact that Highlight.js is not working properly with Prince’s Javascript implementation and that there are places where we don’t want our paragraphs to be indented at all.

We moved elements from the original section templates. We test whether we need to add the Highlight.JS since we dropped the multipage output.

Overriding the section template

Sections are the element type that got the biggest makeover. What we’ve done:

  • Remove filename variable. It’s not needed
  • Remove the result document element since we are building a single file with all our content
  • Change way we check for the type attribute in sections. It will now terminate with an error if the attribute is not found
  • Add the element that will build our running footer (p class=”rh”) and assign the value of the secion’s title to it
  <!-- Override of the section template.-->
  <xsl:template match="section">
    <section>
      <xsl:choose>
        <xsl:when test="string(@type)">
          <xsl:attribute name="data-type">
            <xsl:value-of select="@type"/>
          </xsl:attribute>
        </xsl:when>
        <xsl:otherwise>
          <xsl:message terminate="yes">
            Type attribute is required for paged media. 
            Check your section tags for missing type attributes
          </xsl:message>
        </xsl:otherwise>
      </xsl:choose>
      <xsl:if test="string(@class)">
        <xsl:attribute name="class">
          <xsl:value-of select="@class"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="string(@id)">
        <xsl:attribute name="id">
          <xsl:value-of select="@id"/>
        </xsl:attribute>
      </xsl:if>
      <!-- 
        Running header paragraph.  

        This will be take out of the regular flow of text so 
        it doesn't matter if we add it or not
      -->
      <xsl:element name="p">
        <xsl:attribute name="class">rh</xsl:attribute>
        <xsl:value-of select="title"/>
      </xsl:element> <!-- closses rh class -->
      <xsl:apply-templates/>
    </section>
  </xsl:template>

Metadata

The Metadata section has been reworked into a new section with the title data-type. We set up the container section and assign title to the data-type attribute. We then apply all children templates.

<!-- Metadata -->
<xsl:template match="metadata">
  <xsl:element name="section">
    <xsl:attribute name="data-type">titlepage</xsl:attribute>
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>

Table of contents

The table of content creates anchor links (a href=’#id’) to the title h1 tags we create in the step below. We can do it this way because XSLT guarantees that all calls to generate-id for a given element (in this case the section/title elements) will return the same value for a given execution.

<!-- Create Table of Contents ... work in progress -->
<xsl:template match="toc">
  <section data-type="toc">
    <h1>Table of Contents</h1>
    <nav>
      <ol>
        <xsl:for-each select="//section">
          <xsl:element name="li">
            <xsl:element name="a">
              <xsl:attribute name="href">
                <xsl:value-of select="concat('#', generate-id(.))"/>
              </xsl:attribute>
              <xsl:value-of select="title"/>
            </xsl:element>
          </xsl:element>
        </xsl:for-each>
      </ol>
    </nav>
  </section>
</xsl:template>

Titles

The table of content is commented for now as I work on improving the content and placement of the table contents in the final document.

The title element has only one addition. We add an ID attribute created using XPath’s generate-id function on the parent section element.

  <xsl:template match="title">
    <xsl:element name="h1">
      <xsl:attribute name="id">
        <xsl:value-of select="generate-id(..)"/>
      </xsl:attribute>
      <xsl:if test="string(@align)">
        <xsl:attribute name="align">
          <xsl:value-of select="@align"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:if test="string(@class)">
        <xsl:attribute name="class">
          <xsl:value-of select="@class"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:value-of select="."/>
    </xsl:element> <!-- closes h1 -->
  </xsl:template>
</xsl:stylesheet>

With all this in place we can now look to the CSS Paged Media file.

XML workflows: Converting our content to HTML

Converting our content to HTML

One of the biggest advantages of working with XML is that we can convert the abstract tags into other markups. For the purposes of this project we’ll convert the XML created to match the schema we just created to HTML and then use tools like PrinceXML or AntenaHouse we’ll convert the HTML/CSS files to PDF

Why HTML

HTML is the default format for the web and for most web/html based content such as ePub and Kindle. As such it makes a perfect candidate to explore how to generate it programmatically from a single source file.

HTML will also act as our source for using CSS paged media to create PDF content.

Why PDF

Rather than having to deal with XSL-FO, another XML based vocabulary to create PDF content, we’ll use XSLT to create another HTML file and process it with CSS Paged Media and the companion Generated Content for Paged Media specifications to create PDF content.

Where there is a direct equivalent between our model and the HTML5.1 nightly specification I’ve quoted the relevant section of the HTML5 spec as a reference and as a rationale of why I’ve done things the way I did.

In this document we’ll concentrate on the XSLT to HTML conversion and will defer converting HTML to PDF to a later article.

Creating our conversion style sheets

To convert our XML into other formats we will use XSL Transformations (also known as XSLT) version 2 (a W3C standard) and version 3 (a W3C last call draft recommendation) where appropriate.

XSLT is a functional language designed to transform XML into other markup vocabularies. It defines template rules that match elements in your source document and processing them to convert them to the target vocabulary.

In the XSLT template below, we do the following:

  1. Declare the file as an XML document
  2. Define the root element of the style sheet (xsl:stylesheet)
  3. Indicate the namespaces that we’ll use in the document and, in this case, tell the processor to exclude the given namespace
  4. Strip whitespace from all elements and keep it in the code elements
  5. Create the default output we’ll use for the main document and all generated pages (discussed later)
  6. Create a default template to warn us if we missed anything

[code lang=xml]
<?xml version="1.0" ?>
<!– Define stylesheet root and namespaces we'll work with –>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:epub="http://www.idpf.org/2007/opf"
exclude-result-prefixes="dc epub"
xml:lang="en-US"
version="2.0">
<!– Strip whitespace from the listed elements –>
<xsl:strip-space elements="*"/>
<!– And preserve it from the elements below –>
<xsl:preserve-space elements="code"/>
<!– Define the output for this and all document children –>
<xsl:output name="xhtml-out" method="xhtml" indent="yes" encoding="UTF-8" omit-xml-declaration="yes" />

<!–
Default template taken from http://bit.ly/1sXqIL8

This will tell us of any unmatched elements rather than
failing silently
–>
<xsl:template match="*">
<xsl:message terminate="no">
WARNING: Unmatched element: <xsl:value-of select="name()"/>
</xsl:message>

<xsl:apply-templates/>
</xsl:template>

<!– More content to be added –>
</xsl:stylesheet>
[/code]

This is a lot of work before we start creating our XSLT content. But it’s worth doing the work up front. We’ll see what are the advantages of doing it this way as we move down the style sheet.

Now to our root templates. The first one is the entry point to our document. It performs the following tasks:

  1. Match the root element to create the skeleton for our HTML content
  2. In the title we insert the value of the metadata/title element
  3. In the body we ‘apply’ the templates that match the content inside our document (more on this later)

[code lang=xml]
<!– Root template, matching / –>
<xsl:template match="book">
<html>
<head>
<xsl:element name="title">
<xsl:value-of select="metadata/title"/>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="generator">
<xsl:value-of select="system-property('xsl:product-name')"/>
<xsl:value-of select="system-property('xsl:product-version')"/>
</xsl:attribute>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="vendor">
<xsl:value-of select="system-property('xsl:vendor-url')" />
</xsl:attribute>
</xsl:element>
<xsl:element name="meta">
<xsl:attribute name="vendor-URL">
<xsl:value-of select="system-property('xsl:vendor-url')" />
</xsl:attribute>
</xsl:element>
<link rel="stylesheet" href="css/style.css" />
<xsl:if test="(code)">
<!–
Use highlight.js and docco style
–>
<link rel="stylesheet" href="css/styles/docco.css" />
<!– Load highlight.js –>
<script src="lib/highlight.pack.js"></script>
<script>
hljs.initHighlightingOnLoad();
</script>
</xsl:if>
<!–
Comment this out for now. It'll become relevant when we add video
<script src="js/script.js"></script>
–>
</head>
<body>
<xsl:apply-templates/>
<xsl:apply-templates select="/" mode="toc"/>
</body>
</html>
</xsl:template>
[/code]

We could build the CSS style sheet and JavaScript files as part of our root template but we chose not to.

Working with the style sheet as part of the XSLT style sheet allows the XSLT stylesheet designer to embed the style and parametrize the stylesheet, thus making the stylesheet customizable from the command line.

For all advantages, this method ties the styles for the project to the XSLT stylesheet and requires the XSLT stylesheet designer to remain involved in all CSS and JavaScript updates.

By linking to external CSS and JavaScript files we can leverage expertise independent of the Schema and XSLT style sheets. Book designers can work on the CSS, UX and experience designers can work on JavaScript and other CSS areas, book designers can work on the Paged Media style sheets and authors can just write.

Furthermore we can reuse our CSS and JavaScript on multiple documents.

Table of contents

The table of content template is under active development and will be different depending on the desired output. I document it here as it is right now but will definitely change as it’s further developed.

There is a second template matching the root element of our document to create a table of content. At first thought this looks like the wrong approach

We leverage XSLT modes that allow us to create templates for the same element to perform different tasks. In toc mode we want the root template to do the following:

  1. Create the section and nav and ol elements
  2. Add the title for the table of contents
  3. For each section element that is a child of root create these elements
    1. The li element
    2. The a element with the corresponding href element
    3. The value of the href element (a concatenation of the section’s type attribute, the position within the document and the .html string)
    4. The title of the section as the ‘clickable’ portion of the link

[code lang=xml]
<xsl:template match="/" mode="toc">
<section data-type="toc"> (1)
<nav class="toc"> (1)
<h2>Table of Contents</h2>
<ol>
<xsl:for-each select="book/section">
<xsl:element name="li"> (3.1)
<xsl:element name="a"> (3.2)
<xsl:attribute name="href"> (3.2)
<xsl:value-of select="concat((@type), position(),'.html')"/> (3.3)
</xsl:attribute>
<xsl:value-of select="title"/> (3.4)
</xsl:element>
</xsl:element>
</xsl:for-each>
</ol>
</nav>
</section>
</xsl:template>
[/code]

Metadata and Section

With these templates in place we can now start writing the major areas of the document, metadata and section.

Metadata

The metadata is a container for all the elements inside. As such we just create the div that will hold the content and call xsl:apply-templates to process the children inside the metadata element using the apply-template XSLT instruction. The template looks like this

[code lang=xml]
<xsl:template match="metadata">
<xsl:element name="div">
<xsl:attribute name="class">metadata</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

Section

The section template, on the other hand, is more complex because it has a lot of work to do. It is our primary unit for generating content fifiles, takes most of the same attributes as the root template and then processes the rest of the content.

Inside the template we first create a vairable to hold the name of the file we’ll generate. The file name is a concatenation of the following elements:

  • The type attribute
  • The position in the document
  • the string “.html”

The result-document element takes two parameters: the value of the file name variable we just defined and the xhtml-out format we defined at the top of the document. The XHTML format may look like overkill right now but it makes sense when we consider moving the generated content to ePub or other fomats where strict XHTML conformance is a requirement.

We start generating the skeleton of the page, we add the default style sheet and do the first conditional test of the document. Don’t want to add stylesheets to the page unless they are needed so we test if there is a code element on the page and only add highlight.js related stylesheets and scripts.

In the body element we add a section element, the main wrapper for our content.

For the section we conditionally add attributes to the element. We use only add a data-type attribute to body if there is a type attribute in the source document. We do the same thing for id and class.

[code lang=xml]
<xsl:template match="section">
<!– Variable to create section file names –>
<xsl:variable name="fileName" select="concat((@type), (position()-1),'.html')"/>
<!– An example result of the variable above would be introduction1.xhtml –>
<xsl:result-document href='{$fileName}' format="xhtml-out">
<html>
<head>
<link rel="stylesheet" href="css/style.css" />
<xsl:if test="(code)">
<!–
Use highlight.js and github style
–>
<link rel="stylesheet" href="css/styles/docco.css" />
<!– Load highlight.js –>
<script src="lib/highlight.pack.js"></script>
<script>
hljs.initHighlightingOnLoad();
</script>
</xsl:if>
</head>
<body>
<section>
<xsl:if test="string(@type)">
<xsl:attribute name="data-type">
<xsl:value-of select="@type"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</section>
</body>
</html>
</xsl:result-document>
</xsl:template>
[/code]

Metadata content

We process the content of the metadata separate than the structure. We take our primary metadata elements, ISBN and edition and wrap a paragraph tag around them. We can later reuse the element or change its appearance using CSS.

[code lang=xml]
<xsl:template match="isbn">
<p>ISBN: <xsl:value-of select="."/></p>
</xsl:template>

<xsl:template match="edition">
<p>Edition: <xsl:value-of select="."/></p>
</xsl:template>
[/code]

For authors we do the following:

  1. For each individual in the group we take the first name and the surname
  2. Wrap the name around an li element to build an unnumbered list. We can style the list with CSS later

For editors and other roles we do the same thing

  1. For each individual in the group we take the first name and the surname
  2. We concatenate the type/role to create a full title (production editor for example)
  3. Wrap the name and the title with an li element that we can style with CSS later

[code lang=xml]
<xsl:template match="metadata/authors">
<h2>Authors</h2>
<ul>
<xsl:for-each select="author">
<li>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="surname"/>
</li>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template match="metadata/editors">
<h2>Editorial Team</h2>
<ul class="no-bullet">
<xsl:for-each select="editor">
<li>
<xsl:value-of select="first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="surname"/>
<xsl:value-of select="concat(' – ', type, ' ', 'editor')"></xsl:value-of>
</li>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template match="metadata/otherRoles">
<h2>Production team</h2>
<ul class="no-bullet">
<xsl:for-each select="otherRole">
<li>
<xsl:value-of select="first-name" />
<xsl:text> </xsl:text>
<xsl:value-of select="surname" />
<xsl:text> – </xsl:text>
<xsl:value-of select="role" />
</li>
</xsl:for-each>
</ul>
</xsl:template>
[/code]

Titles and headings

Headings are primarily used to create sections of content. We use the same heading levels as HTML with the addition of a title tag that also maps to a level 1 heading. We’ve put title and h1 in separate templates to make it possible and easier to generate different code for each heading.

Working with XSLT is not the same as using CSS where you can declare rules for the same attribute multiple times (with the last one winning); when writing transformations you can only have one per element otherwise you will get an error (there are exceptions to the rule but let’s not worry about that just yet.)

According to the spec:

These elements [h1 to h6] represent headings for their sections.

The semantics and meaning of these elements are defined in the section on headings and sections.

These elements have a rank given by the number in their name. The h1 element is said to have the highest rank, the h6 element has the lowest rank, and two elements with the same name have equal rank.

h1–h6 elements must not be used to markup subheadings, subtitles, alternative titles and taglines unless intended to be the heading for a new section or subsection. Instead use the markup patterns in the Common idioms without dedicated elements section of the specification.

– 4.3.6 The h1, h2, h3, h4, h5, and h6 elements, Berjon et al. 2013

All elements have the same attribute set: align, class and id.

The remaining headings, h2 through h6, all have the same attributes and the templates are structured the same way.

[code lang=xml]
<xsl:template match="title ">
<xsl:element name="h1">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>

<xsl:template match="h1">
<xsl:element name="h1">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Blockquotes, quotes and asides

Blockquotes, asides and quotes provide sidebar-like content on our document. According to the W3C:

The blockquote element represents content that is quoted from another source, optionally with a citation which must be within a footer or cite element, and optionally with in-line changes such as annotations and abbreviations.
Content inside a blockquote other than citations and in-line changes must be quoted from another source, whose address, if it has one, may be cited in the cite attribute. [emphasis mine]
– 4.51 the Blockquote element , Berjon et al. 2013

The cite HTML provides attribution to the blockquote it is used in. To prevent confusion and to make it’s meaning clear the document model uses the attribution tag instead, their purpose is identical and during the transformation the attribution will become a cite element. According to spec:

The cite element represents a reference to a creative work. It must include the title of the work or the name of the author (person, people or organization) or an URL reference, which may be in an abbreviated form as per the conventions used for the addition of citation metadata. [emphasis mine]

– 4.51 the Cite element , Berjon et al. 2013

[code lang=xml]
<xsl:template match="blockquote">
<xsl:element name="blockquote">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates />
</xsl:element>
</xsl:template>

<!– BLOCKQUOTE ATTRIBUTION–>
<xsl:template match="attribution">
<xsl:element name="cite">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

The q element is the inline equivalent to blockquote and has been replaced in our markup by the quote element. As stated in the HTML5 specification:

The q element represents some phrasing content quoted from another source.

Quotation punctuation (such as quotation marks) that is quoting the contents of the element must not appear immediately before, after, or inside q elements; they will be inserted into the rendering by the user agent.

Content inside a q element must be quoted from another source, whose address, if it has one, may be cited in the cite attribute. The source may be fictional, as when quoting characters in a novel or screenplay.

If the cite attribute is present, it must be a valid URL potentially surrounded by spaces. To obtain the corresponding citation link, the value of the attribute must be resolved relative to the element. User agents may allow users to follow such citation links, but they are primarily intended for private use (e.g. by server-side scripts collecting statistics about a site’s use of quotations), not for readers.

The q element must not be used in place of quotation marks that do not represent quotes; for example, it is inappropriate to use the q element for marking up sarcastic statements.

The use of q elements to mark up quotations is entirely optional; using explicit quotation punctuation without q elements is just as correct.

– 4.5.7 The q element, Berjon et al. 2013

[code lang=xml]
<xs:element name="quote">
<xs:complexType mixed="true">
<xs:attribute name="cite" type="xs:anyURI" use="optional"/>
<xs:attributeGroup ref="genericPropertiesGroup"/>
</xs:complexType>
</xs:element>
[/code]

Asides are primarily used fpr content realted to the main flow of the document. I use it mostly for notes indirectly related to the main content, for example, to explain that there are other ways to generate Schemas apart from W3C’s Schema. It is good to know this but it won’t change the information in the main content flow.

Per Spec:

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.

The element can be used for typographical effects like pull quotes or sidebars, for advertising, for groups of nav elements, and for other content that is considered separate from the main content of the page.

– 4.3.5 The aside element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="aside">
<aside>
<xsl:if test="type">
<xsl:attribute name="data-type">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</aside>
</xsl:template>
[/code]

Div and Span

div and span elements are neutral, they don’t have meaning on their onw but they can get their meaning from attributes such as class, data-*, and id. Divs are meant as block level elements and siblings or children to sections where span is used inline, like a child to our para elements.

According to the specification:

The div element has no special meaning at all. It represents its children. It can be used with the class, lang, and title attributes to mark up semantics common to a group of consecutive elements.

Authors are strongly encouraged to view the div element as an element of last resort, for when no other element is suitable. Use of more appropriate elements instead of the div element leads to better accessibility for readers and easier maintainability for authors.

– 4.4.14 The div element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="div">
<xsl:element name="div">
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

The span element on its own is meaningless. We can give it meaning with the attributes we pass to it. We can give it a class or id for CSS styling or a type to generate semantic meaning for the contained text.

When we start working on an ePub implementation we can also add the epub:type attribute to create an even more detailed semantic map of our content.

The span element doesn’t mean anything on its own, but can be useful when used together with the global attributes, e.g. class, lang, or dir. It represents its children.

– 4.5.28 The span element, Berjon et al. 2013

[code lang=xml]
<xsl:template match="span">
<xsl:element name="span">
<xsl:if test="@type">
<xsl:attribute name="data-type">
<xsl:value-of select="@type"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Paragraphs

The paragraph is our basic unit of content. Paragraphs are usually represented as blocks of text but they can be styled anyway we choose with the proper CSS.

[code lang=xml]
<xsl:template match="para">
<xsl:element name="p">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
[/code]

Styles

Styles are used to indicate typograhical styles such as strong, emphsis, strikethrough and underline.

[code lang=xml]
<xsl:template match="strong">
<strong><xsl:apply-templates /></strong>
</xsl:template>
[/code]

Since we’re working with print and visual media only we use only strong to indicate bold elements. I’ve never understood how do strong and b work in screen and printed pages or in screen displays.

May have to implement b when developing the accessibility component for the document schema.

[code lang=xml]
<xsl:template match="emphasis">
<em><xsl:apply-templates/></em>
</xsl:template>
[/code]

As with strong, I’ve decided to only use emphasis to indicate italics and save i for a future revision when, and if, it becomes necessary

[code lang=xml]
<xsl:template match="strike">
<strike><xsl:apply-templates/></strike>
</xsl:template>
[/code]

Although the strikethrough element has been deprecated in the HTML5 standard, it’s still worth having as it can also be the target for CSS that accomplishes the same goal.

The CSS way is to assign a text-decoration: line-through instruction to the strike selector.

[code lang=xml]
<xsl:template match="underline">
<u><xsl:apply-templates/></u>
</xsl:template>
[/code]

While there is a u element it has different semantics than underline. Like strike the correct way to do it is with CSS; in this case using text-decoration: underlike for the chosen element.

Links and anchors

Links are the essence of the web. They allow you to navigate within the document you’re in or move to external documents. I’ve taken shortcuts and made the label attribute (used for accessibility) and the content of the link the same text. This reduced the ammount of typing we have to do but run the risk of becoming too inflexible.

[code lang=xml]
<xsl:template match="link">
<xsl:element name="a">
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:attribute name="href">
<xsl:value-of select="@href"/>
</xsl:attribute>
<xsl:attribute name="label">
<xsl:value-of select="@label"/>
</xsl:attribute>
<xsl:value-of select="@label"/>
</xsl:element>
</xsl:template>
[/code]

When working with links there are times when we want to link to sections within the same document or to specific sections in another document or to specific sections inside a paragraph or to a figure. To do this we need anchors that will resolve to the following HTML:

[code lang=html]
<a id="#target"><a>
[/code]

The transformation element looks like this:

[code lang=xml]
<xsl:template match="anchor">
<xsl:element name="a">
<xsl:attribute name="id">
</xsl:attribute>
</xsl:element>
</xsl:template>
[/code]

Not sure if I want to make this an empty element or not

Most if not all the elements in our document model can use the id attribute so there is no realy need for the anchor element.

However the empty element <anchor id=”home”/> appeals to my ease of use paradigm but it may not be as easy to understand for peope who are not familiar with XML empty elements

Code blocks

Code elements create fenced code blocks like the ones from Github Flavored Markdown.

We use Adobe Source Code Pro font. It’s a clean and readable font designed specifically for source code display.

We highlight our code with Highlight.js. This makes the class attribute mandatory as we need it to tell highlight.js what syntax library to use

[code lang=xml]
<xsl:template match="code">
<xsl:element name="pre">
<xsl:element name="code">
<xsl:attribute name="class">
<xsl:value-of select="@language"/>
</xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:element>
</xsl:template>
[/code]

Lists and list items

When I first conceptualized this project I had designed a single list element and attributes to produce bulleted and numbered lists. This proved to difficult to implement so I went back to two separate elements: ulist for bulleted lists and olist for numbered lists.

Both elements share the item element to indicates the items inside the list. At least one item is required a list.

[code lang=xml]
<xsl:template match="ulist">
<xsl:element name="ul">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="olist">
<xsl:element name="ol">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

<xsl:template match="item">
<xsl:element name="li">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
[/code]

Figures and Images

The figure element represents some flow content, optionally with a caption, that is self-contained (like a complete sentence) and is typically referenced as a single unit from the main flow of the document.

The element can thus be used to annotate illustrations, diagrams, photos, code listings, etc.

A figure element’s contents are part of the surrounding flow. If the purpose of the page is to display the figure, for example a photograph on an image sharing site, the figure and figcaption elements can be used to explicitly provide a caption for that figure. For content that is only tangentially related, or that serves a separate purpose than the surrounding flow, the aside element should be used (and can itself wrap a figure). For example, a pull quote that repeats content from an article would be more appropriate in an aside than in a figure, because it isn’t part of the content, it’s a repetition of the content for the purposes of enticing readers or highlighting key topics.

– 4.4.11 The figure element, Berjon et al. 2013

Figures, captions and the images inside present a few challenges. Because we allow authors to set height and width on both figure and the image inside we may find situations where the figure container is narrower than the image inside.

To avoid this issue we test whether the figure width value is smaller than the width of the image inside. If it is, we use the width of the image as the width of the figure, otherwise we use the width of the image inside.

We do the same thing for height in order to avoid squished images of captions that draw over the image because it’s too small. If the height of the figure is smaller than the height of the image we use the height of the image, otherwise we use the height of the figure element.

For both height and width we concatenate the attribute value with the string ‘px’ to make sure that it works in both straight CSS and with Prince and other CSS PDF generators

Alignments can be different, it is possible to have a right-aligned image to live inside a centered container.

The data model for our content allows both figures and images to be used in the document. This is so we don’t have to insert empty captions to figures just so we can add an image. If we don’t want a caption we can insert the image directly on our document.

Contrary to the HTML specification we use figure only to display images. We have a specialized template to address code blocks for program listings and can create additional elements

[code lang=xml]
<xsl:template match="figure">
<xsl:element name="figure">
<xsl:if test="string(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="string(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
<xsl:choose>
<xsl:when test="string(width) and (@width lt image/@width)">
<xsl:attribute name="width">
<xsl:value-of select="@width"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="width">
<xsl:value-of select="image/@width"/>
</xsl:attribute>
</xsl:otherwise>
</xsl:choose>
<xsl:choose>
<xsl:when test="string(@height) and (@height lt image/@height)">
<xsl:attribute name="width">
<xsl:value-of select="@height"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="width">
<xsl:value-of select="image/@height"/>
</xsl:attribute>
</xsl:otherwise>
</xsl:choose>
<xsl:if test="(@align)">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="image"/>
<xsl:apply-templates select="figcaption"/>
</xsl:element>
</xsl:template>

<xsl:template match="figcaption">
<figcaption><xsl:apply-templates/></figcaption>
</xsl:template>

<xsl:template match="image">
<xsl:element name="img">
<xsl:attribute name="src">
<xsl:value-of select="@src"/>
</xsl:attribute>
<xsl:attribute name="alt">
<xsl:value-of select="@alt"/>
</xsl:attribute>
<xsl:if test="(@width)">
<xsl:attribute name="width">
<xsl:value-of select="@width"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@height)">
<xsl:attribute name="height">
<xsl:value-of select="@height"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@align)">
<xsl:attribute name="align">
<xsl:value-of select="@align"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@class)">
<xsl:attribute name="class">
<xsl:value-of select="@class"/>
</xsl:attribute>
</xsl:if>
<xsl:if test="(@id)">
<xsl:attribute name="id">
<xsl:value-of select="@id"/>
</xsl:attribute>
</xsl:if>
</xsl:element>
</xsl:template>
[/code]

XML workflows: Introduction

One of the biggest limitations of markup languages, in my opinion, is how confining they are. Even large vocabularies like Docbook have limited functionality out of the box. HTML4 is non-extensible and HTML5 limits how you can extend it (web components are the only way to extend HTML5 I’m aware of that doesn’t need an update to the HTML specification.)

By creating our own markup vocabulary we can be as expressive as we need without adding complexity for writers and users and without adding unnecessary complexity for the developers building the tools to interact with the markup.

Why create our own markup

I have a few answers to that question:

In creating your own xml-based markup you enforce separation of content and style. The XML document provides the basic content of the document and the hints to use elsewhere. XSLT style sheets allow you to structure the base document and associated hints into any several formats (for the purposes of this document we’ll concentrate on XHTML, PDF created through Paged Media CSS and PDF created using XSL formatting Objects)

Creating a domain specific markup vocabulary allows you think about structure and complexity for yourself as the editor/typesetter and for your authors. It makes you think about elements and attributes and which one is better for the given experience you want and what, if any, restrictions you want to impose on your makeup.

By creating our own vocabulary we make it easier for authors to write clean and simple content. XML provides a host of validation tools to enforce the structure and format of the XML document.

Options for defining the markup

For the purpose of this project we’ll define a set of resources that work with a book structure like the one below:

[code lang=xml]
<book>
<metadata>
<title>The adventures of Sherlock Holmes</title>
<author>
<first-name>Arthur</first-name>
<surname>Conan Doyle</surname>
</author>
</metadata>
<section type="chapter">
<para>Lorem Ipsum</para>
<para>Lorem Ipsum</para>
</section>
</book>
[/code]

It is not a complete structure. We will continue adding elements after we reach the MVP (Minimum Viable Product) stage. As usual, feedback is always appreciated.

Creating a Github publishing workflow

Lord of the Files: How GitHub Tamed Free Software (And More) was an article first published in Wired in February of 2012. It was an interesting article about a company which work I value enough to pay for the service.

The article itself wasn’t what caught my attention. It was the Github repository that Wired created to go with the article. Their experience highlights a big potential of technology for the publishing process: It makes collaboration at all stages of the process easier. While researching this idea I came across this blog post

All this made me think about how Github develops their documentation and if we can use a similar process to a normal publishing development workflow.

The idea starts with public repositories but it should work the same for private repositories and Github Enterprise repositories.

This workflow expects a basic level of familiarity with Github and the following

  • All participants should understand how handle Github repositories, how to commit files (both initial commit and update) to their repositories and how to create pull requests
  • Project editors should also know how to handle pull requests and how to handle conflict merges
  • Project leaders should be Github power users or, at the very least, know where to get help

All the skill requirements listed above can be handled through training and support either in-house or from Github.

Process

The outline below shows the workflow from both a Github and content perspectives.

workflow

Setting up the project

In the first stages we create the repository in Github and add the content to it. Ideally this initial process would be done by the same person.

Instructions for creating Github repositories are located in the Github help article: https://help.github.com/articles/create-a-repo/

Individual contributors fork and clone the project

For the project to be useful each individual contributor to the project has to have his/her own copy of the project where to make changes without disrupting the master copy. Different contributors may make different changes to the same file and that’s where pull requests (discussed in more details below) come in handy.

Forking and cloning are explained in more detail in this Github Help article

Edit your local copy

One of the advantages of this model is that you’re making changes to your local copy, not the master repository. If you are not happy with the changes and they are to extensive to undo manually you can always delete your working copy and clone yor repository again. In more extreme cases you can delete your fork of the project in Github and fork it again. This will get you a brand new copy of the project at the cost of loosing any changes you made since the fork was created.

Create a pull request for discussion and edit the content as needed

Pull requests allow users to tell each other about changes in the their copy of the repository against the repository the local copy was forked from.

Pull creation requests are described in this Github Help article

If you want specific people’s attention you can add their username with a @ before the user name. if the username is gandalf, you can include them in the conversation using @gandalf

Discuss the content

In addition to the initial pull request notification, the discussions in a Pull request provide notification of:

  • Comments left on the pull request itself.
  • Additional commits/edits pushed to the pull request’s branch.

The request can be left open until everyone agrees it’s completed or until the project lead closes it as complete.

Continue working on the project

Depending on the the existing workflow the Github-based authoring process can be added as an additional step or hooks can be added to Github to further process the content before final publication.

Github followed this technique by adding hooks that transfer the edited content to an external Ruby on Rails application that served the content.

If you are serving your content from gh-pages (Github’s provided project websites) you can serve the content from the master repository

Things to research further

Do we need to create branches for the proposed edits? Github and most Git good practices strongly suggest the use of topic branches but I don’t see them being absolutely necessary for publishing.