Docbook, XML and ebooks:Creating eBooks the old fashioned way
One of the most traditional ways to author content for multiple distribution channels is to roll up your sleeves, write XML and then convert it to your target format. For this exercise we will use Docbook. Without going into too much detail, Docbook was initially created in 1991 as a means to create computer software manuals and other technical documentation. Over the years Docbook has evolved into a general purpose XML authoring language. Along with the authoring standard, what structures we can use to author our content, the authors of the Docbook standard have also created a set of stylesheets to convert our base XML files into different formats. One of the formats that you can convert your XML files is epub.
More information about the history of Docbook can be found in the Docbook: The Definitive Guide website
Getting Started #
Below is a skeleton XML file for a Docbook-based book.
<?xml version="1.0" encoding="utf-8"?>
<book xmlns='http://docbook.org/ns/docbook' version="5.0" xml:lang="en">
<title>The Adventures of Sherlock Holmes</title>
<p><para>Content is required in chapters too.</para>
Once we have the XML document ready (filling out the skeleton with as many chapters as we need to complete our content), we need two things:
- A set of XSLT stylesheets to convert our XML into HTML and ePub
- A processor to actually run the transformation The stylesheets are located at
http://sourceforge.net/projects/docbook/files/docbook-xsl-ns/1.76.1/ where you can choose if you want .zip or .tag.gz compressed archives. In addition download the file at http://sourceforge.net/projects/docbook/files/epub3/docbook-epub3-addon-b3.zip and http://sourceforge.net/projects/docbook/files/epub3/README.epub3. To enable ePub3 support follow the instructions on the README.epub3 file.
The stock Docbook style sheets produce ePub 2 compliant books. This is ok for now as most readers that support ePub support this version. There is experimental support for ePub 3 compliant books, which we will follow for this article as it gives us access to all the multimedia features of ePub3.
As far as XSLT processors there are two that I recommend. One is Saxon; currently at version 9.4 and available from its publisher Saxonica on a trial basis. Yes, it is commercial software but after years of using it I highly recommend the investment. It is written in Java and provides a full set of features, extensions and advanced implementations of XML related technologies. For our purposes it's enough that it will take the XML, process it with the style sheets and give us the output we want.
The second processor I recommend is XSLTProc. written in C and bundled with Most UNIX/Linux/OSX installations it can be downloaded/updated from the xmlsoft.org web site. Download and install both LibXML and LibXSLT and install them in the same order (LibXML first and then LibXSLT) or it will not work as you think it will.
The commands to create the ebooks using Xsltproc are:
xsltproc /Users/carlos/docbook/1.0/xslt/epub3/chunk.xsl ebook.xml
This produces an output that should look like this:
Writing OEBPS/bk01-toc.xhtml for book
Writing OEBPS/ch01.xhtml for chapter
Writing OEBPS/ch02.xhtml for chapter
Writing OEBPS/index.xhtml for book
Writing OEBPS/docbook-epub.css for book
Generating EPUB package files.
Generating image list ...
Writing OEBPS/package.opf for book
Writing OEBPS/../META-INF/container.xml for book
Writing OEBPS/../mimetype for book
Generating NCX file ...
Writing OEBPS/toc.ncx for book
< ?xml version="1.0" encoding="UTF-8"?> '
Final Details #
We are done generating the content and the files we need in order to generate the eBook. To finish the process we need to do the following (taken from the README.epub3 file):
Manually copy any image files used in the document into the corresponding locations in the $base.dir directory.
For example, if your document contains:
If the base.dir attribute is set up to the ebook1/OEBPS, you would copy the file to: ebook1/OEBPS/images/caution.png. You can get a list of image files from the manifest file (ebook1/OEBPS/package.opf in our example) that is created by the style sheet.
Currently the stylesheets will not include generated image files for callouts, header/footers, and admonitions. These files have to be added manually.
cd to the directory containing your mimetype files, which would be ebook1 in this example.
Run the following zip commands to create the epub file
zip -X0 sherlock-holmes.epub mimetype
zip -r -X9 sherlock-holmes.epub META-INF OEBPS [/bash]
The first command adds the 'mimetype' file first and uncompressed. The -X option excludes extra file attributes (required by epub3). The numbers indicate the degree of compression. The -r option means recursively include all directories. The "sherlock-holmes.epub" in this example is the output file.
Because we have done most of the work manually we need to validate the result of our work. For that we will use the epubcheck3 tool available from its Google Code Project repository.
java -jar /Users/carlos/Java/epubcheck-src-3.0b3/dist/epubcheck-3.0b3.jar sherlock-holmes.epub
Hopefully we'll see a result like this
Epubcheck Version 3.0b3 No errors or warnings detected.