Skip to main content
Dublin Library

The Publishing Project

HTML as a single source format

 

In this essay I will take what may be an unpopular position: Using HTML with XML syntax (XHTML) is currently the best format to put your content in because it is easier to convert from XHTML/CSS to pretty much any other format. In making this case we'll explore and work in the following areas and answer the following questions:

Definitions #

When we speak about XHTML in this document we refer to an HTML document using XHTML syntax. I will not change the mime type on the server to fully comply with XHTML restrictions.

Why XHTML #

The two main reasons I advocate XHTML as an authoring format are

XHTML enforces code clarity and authoring discipline #

XHTML limits the freeform structure of HTML5. Documents conforming to XHTML specifications must have, at a minimum:

  • A DOCTYPE declaration
  • An HTML element
  • A HEAD element
  • TITLE element
  • BODY element

The structure written as XHTML tags looks like this:

[xml] Title Goes Here<title> <head> <body> <h1>Content Area</h1> <body> <html> [/xml]</p> <p>This minimal structure must comply with the requirements below</p> <p><strong>All XHTML tag names & attribute names must be in lowercase</strong></p> <p>All XHTML attributes and elements must be in lower case</p> <p>The following elements are not legal XHTML:</p> <p>[xml] <DIV CLASS="chapter">Chapter 1</div></p> <Div Class="chapter">Chapter 1</div> \[/xml\] <p><strong>All XHTML elements must close</strong></p> <p>All elements must be closed, this includes both our standard tags such as the paragraph tag</p> <p>[xml] <p>This is a paragraph</p> [/xml]</p> <p>to empty elements such as images and form inputs elements</p> <p>[xml] <img src="images/test.png" height="800" width="600" alt="Test image" /></p> <p><input type="submit" value="Submit" /> [/xml]</p> <p><strong>All XHTML elements must be properly nested</strong></p> <p>XHTML insists on proper nesting of the elements on our content. This is no longer legal</p> <p>[xml] <p>This is the content of a paragraph</p> <p>This is our second paragraph \[/xml\] <p>And it should be writen like this:</p> <p>[xml] <p>This is the content of a paragraph</p></p> <p>This is our second paragraph</p> \[/xml\] <p><strong>All XHTML attribute values must be quoted</strong></p> <p>In addition to being lowercased, attributes must be quoted. Rather than:</p> <p>[xml] <div class=chapter>Chapter 1</div> [/xml]</p> <p>It has to be written like this:</p> <p>[xml] <div class="chapter">Chapter 1</div> [/xml]</p> <h2 id="because-it-is-structured-we-can-use-transformation-tools-to-convert-to-from-xhtml" tabindex="-1">Because it is structured, we can use transformation tools to convert to/from XHTML <a class="header-anchor" href="#because-it-is-structured-we-can-use-transformation-tools-to-convert-to-from-xhtml">#</a></h2> <p>A lot of the discussions I've had with people seem to focus in the drawbacks of XHTML format as end users. One of the strengths W3C cited when moving to XHTML as the default format for the web was how easy it was for machines to read it and covert it to other formats.</p> <p>I'll cover two examples using <a href="http://daringfireball.net/projects/markdown/">Markdown</a>: straigth transformation and converting Markdown into templated XHTML and an example of using <a href="http://www.w3.org/TR/xslt">XSLT 1.0</a> to covert one flavor of XHTML into another using Xsltproc</p> <h3 id="from-markdown-to-html-straight-up" tabindex="-1">From markdown to html, straight up <a class="header-anchor" href="#from-markdown-to-html-straight-up">#</a></h3> <p>One of the goals of Markdown is to: <strong><em>allow you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).</em></strong> The original tool and all its ttranslations to other languages are built to allow the conversion; where they are different is in the number of extensions to the core markdown language and the language the tools themselves are written on.</p> <p>For these examples I chose Python Markdown mostly because it's the language and the tool I'm familiar with. We will use the markdown file for the Markdown home page at Daring Fireball <a href="http://daringfireball.net/projects/markdown/index.text">http://daringfireball.net/projects/markdown/index.text</a></p> <p>Below is a portion of the resulting XHTML code:</p> <p>[xml] <h2>Introduction</h2></p> <p>Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).</p> <p>Thus, "Markdown" is two things: (1) a plain text formatting syntax; and (2) a software tool, written in Perl, that converts the plain text formatting to HTML. See the <a href="/projects/markdown/syntax">Syntax</a> page for details pertaining to Markdown's formatting syntax. You can try it out, right now, using the online <a href="/projects/markdown/dingus">Dingus</a>.</p> <p>The overriding design goal for Markdown's formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions. While Markdown's syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown's syntax is the format of plain text email.</p> <p>The best way to get a feel for Markdown's formatting syntax is simply to look at a Markdown-formatted document. For example, you can view the Markdown source for the article text on this page here: <a href="http://daringfireball.net/projects/markdown/index.text">http://daringfireball.net/projects/markdown/index.text</a></p> \[/xml\] <p>The conversion process itself is simple. Using the Perl version it looks like this:</p> <pre><code>markdown content/webgl-2d-scale.md > test.html </code></pre> <h3 id="from-markdown-to-templated-xhtml" tabindex="-1">From Markdown to templated XHTML <a class="header-anchor" href="#from-markdown-to-templated-xhtml">#</a></h3> <p>As part of my sunshine-markdown project I've researched ways to convert markdown to XHTML. as</p> <pre><code>[10:30:54] carlos@rivendell sunshine-markdown 4826$ ./sunshine --verbose processing: content/webgl-2-textures.md processing: content/webgl-2d-matrices.md processing: content/webgl-2d-rotation.md processing: content/webgl-2d-scale.md processing: content/webgl-2d-translation.md processing: content/webgl-2d-vs-3d-library.md processing: content/webgl-3d-camera.md processing: content/webgl-3d-orthographic.md processing: content/webgl-3d-perspective.md processing: content/webgl-3d-textures.md processing: content/webgl-and-alpha.md processing: content/webgl-animation.md processing: content/webgl-boilerplate.md processing: content/webgl-fundamentals.md processing: content/webgl-how-it-works.md processing: content/webgl-image-processing-continued.md processing: content/webgl-image-processing.md processing: index.md </code></pre> <p>Sunshine is hardcoded to put the content of each markdown file into a template that looks something like this:</p> <p>[xml] < ?xml version="1.0" encoding="UTF-8"?> %(title)s</p> <h1 id="title-s" tabindex="-1">%(title)s <a class="header-anchor" href="#title-s">#</a></h1> <p>%(content)s</p> <p>[/xml]</p> <h3 id="using-xslt-to-convert-xhtml-into-epub-ready-xhtml" tabindex="-1">Using XSLT to convert XHTML into ePub-ready XHTML <a class="header-anchor" href="#using-xslt-to-convert-xhtml-into-epub-ready-xhtml">#</a></h3> <p>One of the things we forget is that, because XHTML is structured content we can use XSLT and XPATH to convert it to other XML-based dialects, such as the XHTML dialect required for ePub3 conformance. A basic template to convert a <code>div</code> into a section with the proper attributes for ePub work may look something like this:</p> <p>[xml] < ?xml version="1.0"?>[/xml]</p> <ul class="post-metadata post-tags"><span class="iconfont tags"></span> </ul> <p class="edit-on-github-wrap"><a class="" href="https://github.com/caraya/personal-blog/edit/main/content/blog/html-as-a-single-source-format.md" target="_blank">Edit on Github</a></p> <div class="prev-next"> <a rel="prev" class="next" href="/visualizing-css-properties/">Visualizing CSS properties</a> <a rel="prev" class="prev" href="/flex-boxes-and-the-holy-grail/">Flex Boxes and the Holy Grail</a> </div> </article> </main> <footer> <div class="left-footer"> <section class="social"> <h4>Social Me</h4><section class="social-container"> <a class="github" href="https://github.com/caraya" target="_blank" rel="me" aria-label="GitHub" style="--color: #333"><i class="bi bi-github"></i></a><a class="codepen" href="https://codepen.io/caraya/" target="_blank" rel="me" aria-label="CodePen" style="--color: #333"><i class="bi bi-code"></i></a><a class="twitter" href="http://twitter.com/elrond25" target="_blank" rel="me" aria-label="Twitter" style="--color: #1DA1F2"><i class="bi bi-twitter"></i></a><a class="mastodon" href="https://hachyderm.io/@elrond25" target="_blank" rel="me" aria-label="Mastodon" style="--color: rgb(99 100 255)"><i class="bi bi-mastodon"></i></a> </section> </section> <section classs="latest"> <h4>Latest Posts</h4><ul> <li><a href="/working-with-javascript-events/"> Working with Javascript events </a></li> <li><a href="/web-components-frameworks-or-both/"> Web Components, Frameworks or Both? </a></li> <li><a href="/web-components-review/"> Web Components Review </a></li> <li><a href="/whats-going-on-with-wordpress/"> What's Going On With WordPress </a></li> <li><a href="/building-a-recipe-database/"> Building a Recipe Database </a></li> </ul> </section> </div> <div class="right-footer"> <h4>Search</h4> <div id="search"></div> <nav id="nav"> <h4>Links</h4> <div class="nav-container"> <ul class="nav-footer-menu"> <li class="nav-item"><a href="/./">Home</a></li> <li class="nav-item"><a href="/about/">About</a></li> <li class="nav-item"><a href="https://patterns.rivendellweb.net/" class=selected target="_blank">Patterns</a></li> <li class="nav-item"><a href="https://projects.rivendellweb.net" class=selected target="_blank">Projects</a></li> <li class="nav-item"><a href="https://layout-experiments.rivendellweb.net/" class=selected target="_blank">Layouts</a></li> <li class="nav-item"><a href="https://github.com/caraya/personal-blog" class=selected target="_blank">Blog Repo</a></li> </ul> </div> </nav> </div> </footer> </div> <!-- Current page: /html-as-a-single-source-format/ --> <!-- Configure and load Service Worker --> <script src="/js/prism.js"></script> <script type="module"> // Test if service workers are supported if ('serviceWorker' in navigator) { // Attempt to register it navigator .serviceWorker .register('/sw.js') .then(function () { // Success Message console.log('ServiceWorker succesfully registered'); }) .catch(function (err) { // Error Message console.log('ServiceWorker registration failed: ', err); }); } </script> <!-- added for Algolia --> <script type="text/javascript" src="/js/algoliasearchNetlify.js"></script> <script type="text/javascript"> algoliasearchNetlify({ appId: '533NB4QJXX', apiKey: '0cd62f229744d35fb517b13466758622', siteId: '26351df2-c0d4-4526-950c-3866501cb99d', branch: 'main', selector: 'div#search', }); </script> </body> </html>