Static Site Generators: Nunjucks and Gulp

A next step is to build our own templating solution using Gulp. I’ve spent longer than I wanted in crafting this solution and it’s still, like the idea I based this from, has a lot of things I’m working on understanding and changing.

The idea of using a templating engine is to have options when creating content. We can create partial content blocks and define different layouts for our content.

Granted, this moves away from our simple model when using templates but we gain more flexibility in what we can do with the templated content.

We need to install the NPM packages that we need to make this work.

npm i -D nunjucks-markdown \
marked \
gulp-rename \
nunjucks\
gulp-nunjucks

We then require the packages we installed.

// Nunjucks and Markdown
const nunjucks = require('nunjucks');
const markdown = require('nunjucks-markdown');
const marked = require('marked');
const gulpnunjucks = require('gulp-nunjucks');

Rather than copying our directory paths in multiple places, we write them down one and then reference them wherever we need them.

// Nunjucks consts for file location
const dist = 'docs';
const src = 'src';
const templates = src + '/partials';

Here is also the first difference with our template copy. Rather than separate our partials (files with .njk as the extension) and our pages (.html files) we put them all in one place. The partials directory I used while developing this project looks like this.

partials
├── about.html
├── base.njk
├── css-containment.html
├── footer-scripts.njk
├── from-markdown-to-html.html
├── head.njk
├── index.html
├── javascript-dom.html
├── latex-to-web.html
└── voice-ui-agent.html

Next, we use the Environment class and the FileSystemLoader to load templates from the specified directory; the one we defined in the template variable.

// Where to pull files from?
const env = new nunjucks.Environment(new nunjucks.FileSystemLoader(templates));

The next step is optional and configures the Marked Markdown parser. Because all assets are compiled at build time there is not as much worry about sanitizing the output of Marked.

Do not change the sanitize setting if you will accept user templates or if you’re using third-party code.

Once the configuration is complete we register the marked instance to work with nunjucks-markdown.

// Markdown options
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
  tables: true,
  breaks: false,
  pedantic: false,
  sanitize: false,
  smartLists: true,
  smartypants: true,
});

markdown.register(env, marked);

Now that configuration is complete we can work with Gulp to create the rendering task.

We use all the HTML files in our partials directories and them compile them. This will take care of creating the fully templated HTML and converting any Markdown inside the pages to HTML. It will place the resulting pages inside the docs directory. I chose docs because it’s one of the default directories that Github Pages allows you to use when documenting a repository.

gulp.task('pages', function() {
  return gulp.src([templates + '/*.html', templates + '/**/*.html'])
      // Renders template with nunjucks and marked
      .pipe(gulpnunjucks.compile('', {env: env}))
      .pipe(gulp.dest(dist));
});

That’s it. We have a far more flexible structure to build from and we can create our own design system components to make out lives easier in the long run.

As with the templating solution, we can create additional tasks to enhance the resulting HTML pages. I’ve created a proof of concept project that illustrates how this works.

Static Site Generators: Markdown and Templates

I’ve looked at static site generators like Hugo, Gatsby, and Jekyll among others. They all have their strengths and weaknesses but they are overkill if all you want to do is throw together a quick prototype with a few pages and stylesheets.

Markdown, HTML and templates: Version 1

Before we start we’ll take the following steps:

  • Create our root directory (static-gen)
  • Create two working directories (public and src)
  • Initialize package.json
mkdir -p static-gen/src
mkdir -p static-gen/public
npm init --yes

The first version uses the wrap-around system that I use to generate content for my blog. I’ve described the process in detail in Generating HTML and PDF from Markdown.

Install the packages we need to run the conversion tasks:

npm i -D  [email protected] gulp-remarkable gulp-newer gulp-wrap

The two tasks that run the conversion are shown below. The first task converts the Markdown into an HTML fragment using the Remarkable markdown parser.

gulp.task('markdown', () => {
  return gulp.src('src/pages/*.md')
      .pipe(markdown({
        preset: 'commonmark',
        html: true,
        remarkableOptions: {
          html: true,
          typographer: true,
          linkify: true,
          breaks: false,
        },
      }))
      .pipe(gulp.dest('src/converted-md/'));
});

The second task inserts the resulting HTML into an HTML template that contains all the styles and scripts that we want to run on the pages.

gulp.task('build-template', ['markdown'], () => {
  return gulp.src('./src/converted-md/*.md')
      .pipe(wrap({
        src: './src/templates/template.html',
      }))
      .pipe(extReplace('.html'))
      .pipe(gulp.dest('docs/'));
});

This version has a problem: It keeps escaping the code and presenting it as a preformatted code inside pre and code tags. for the templates to work with both Markdown and HTML we must handle template creation separately for each format. These are still not full HTML pages but are written in HTML rather than Markdown so using the HTML extension is important.

The new template looks very similar to the one we’re using to handle Markdown:

gulp.task('build-html-template', () => {
  return gulp.src('./src/pages/*.html')
      .pipe(wrap({
        src: './src/templates/template.html',
      }))
      .pipe(gulp.dest('docs/'));
});

We’ve only covered the HTML generation portions of the template-based static site generator but it does more. Out of the box, it will handle SCSS to CSS transpilation, ES2015+ to ES5 transpilation and image compression using Imagemin. Since it’s Gulp-based you can integrate any other Gulp supported task into the process.

Because we’re passing the results directly to the template we can add any type of HTML that we want, whether directly as HTML tags and attributes or Markdown to be interpreted.

Future Evolutions

Right now all pages are converted using the same template. This works but it’s inflexible. We could create additional templates and associated Gulp tasks to create different HTML based on the templates but it’s not really productive. In the next post, we will look at using a templating engine to generate our content.

CSS Containment

Containment may help prevent this and make CSS even more awesome 🙂

Whenever we insert HTML elements after the document loads by inserting new CCSS rules or new elements via Javascript, we may be slowing down the rendering of the page because every change means the browser has to navigate all the elements in scope and re-render them as needed, they may have been moved or changed their dimensions when our target element grew smaller or larger;

Layout is almost always scoped to the entire document meaning that the browser will navigate all the way to the beginning of the document to calculate sizes and layout for the document. If you have a lot of elements, it’s going to take a long time to figure out their locations and dimensions.

The contain CSS property allows an author to indicate that an element and its contents are, as much as possible, independent of the rest of the document tree. This allows the browser to recalculate layout, style, paint, size, or any combination of them for a limited area of the DOM and not the entire page.

It can take one or more of the following values:

size
The size of the element can be computed without checking its children, the element dimensions are independent of its contents.
layout
The internal layout of the element is totally isolated from the rest of the page, it’s not affected by anything outside and its contents cannot have any effect on the ancestors.
style
Indicates that, for properties that can have effects on more than just an element and its descendants, those effects don’t escape the containing element.
The style values has been marked at risk and, as such, it may not make it to the final recomendation. Mozilla has already dropped it from Firefox.
paint
Descendants of the element cannot be displayed outside its bounds, nothing will overflow this element (or if it does it won’t be visible).

In addition, there are two grouping values that shorten what you type as the value of the attribute:

strict
This value turns on all forms of containment except style contain for the element. It behaves the same as contain: size layout paint
content
This value turns on all forms of containment except size containment and style containment for the element. It behaves the same as contain: layout paint;.

When we add the newly-added-element element to the page, it will trigger styles, layout, and paint but, one thing we need to consider is that the DOM for the whole document is in scope. The browser will have to consider all the elements irrespective of whether or not they were changed when it comes to styles layouts and paint.

The bigger the DOM, the more computation work the browser has to do, meaning that your app may become unresponsive to user input in lager documents.

In addition to what the browser already does to help with scoping of your CCSS, you can use the scope property of CSS as an additional indicator of how the browser should handle layout, size and paint containment.

In the example below adding the new-element div will cause styles, layout and paint redraw of the whole document tree. For illustration, we haven’t added content to the HTML but you can imagine how large it can become, particularly in a single page application.

<section class="view">
  Home
</section>

<section class="view container">
  About
  <div class="new-element">Check me out!</div>
</section>

<section class="view">
  Contact
</section>

In CSS we can use containment to help the browser out with optimizations. It would be tempting to use strict for all items that we want to use containment for but we need to know the dimensions ahead of time and include them in our CSS otherwise the element might be rendered as a 0px by 0px box. Test everything thoroughly both in browsers that support containment and those that don’t support it.

Content containment (contains: content) offers significant scope improvements, without having to specify the dimensions of the element ahead of time.

You should look at contain: content as your default and treat contain: strict as an escape hatch when contain: content doesn’t quite cut the mustard.

To make sure that the layout and paint for our new-element div don’t affect the rest of the document, we can use a rule like this:

.new-element {
  contain: content;
  /* the rest of the rules for the class */
}

Links and resources

Using Latex to build web content

Latex is an old-school language for document typesetting. It was created by Donald Knuth to typeset his book The Art of Computer Science. You still see LaTex in scientific articles and papers

If you’re only familiar with HTML, Latex syntax will look strange. Rather than tags and attributes we have a preamble, package declarations, and instructions.

A basic LaTex article, set to print in portrait mode with a body text size of 12 points looks like this:

\usepackage{amssymb}
\usepackage{epstopdf}
% Broken into two lines for readability. In production
% the command would go in one line
\DeclareGraphicsRule{.tif}{png}{.png}
{`convert #1 `dirname #1`/`basename #1 .tif`.png}

\title{Brief Article}
\author{The Author}
%\date{}
% Activate to display a given date or no date

\begin{documen
\maketitle
\section{}
\subsection{}
\end{document}

documentclass indicates what type of document we want to create the parameters (font size in this case) is in square brackets []

\usepackage{} loads modules for use into the document

Commands that start with a backslash, like \geometry{} and \DeclareGraphhicsRule{} are instructions that will generate the output of some kind for the document.

Converting Latex to HTML

I like LaTex but I’m still a fan of the web and I want to make sure that whatever I create in LaTex is also available on the web, assuming the publisher allows me to 🙂

There are two ways to create HTML content from LaTex files. The first one is tex4ht or htlatex and the second one is make4ht, an abstraction on top of tex4ht that simplifies adding options to the different pieces of the configuration.

The rest of the article uses the tex file from this gist as the source for the commands.

tex4ht

tex4ht converts LaTex sources into one or more HTML documents with a very (and I mean very) basic style sheet that you can customize and expand as needed.

The most basic command will create a single page for all the content along with the corresponding image

htlatex article.tex

For shorter articles, the single-file approach may be ok (with customized styles) but for larger files or articles with larger sections, it may prove harder to read online.

We can break the article down into multiple files based on the headings on the document.

The example below will generate multiple files and it will also generate navigation links within the pages of the document.

The styles, as with the previous document can definitely be enhanced.

htlatex article.tex "html,index=2,3,next"

For those interested, you can also convert your LaTex to Docbook 5.0. While you can also convert it to TEI, it fails to convert the file successfully and I’m not certain why as the document converts successfully to Docbook.

# Conversion to Docbook
htlatex article.tex "xhtml,docbook" " -cunihtf" "-cdocbk"
# Conversion to TEI
htlatex article.tex "xhtml,tei" " -cunihtf" "-cdocbk"

make4ht

As we’ve discussed tex4ht system supports several output formats with multiple steps and multiple parameters possible for each step and format combination.

I just want to make sure this is visible as it’ll save a lot of time if you know it exists and where you can find its documentation.

The most basic version of the htlatex command will convert the TeX file into HTML using UTF-8 as the encoding:

make4ht -uf html5 filename.tex

When you just add new text to your TeX document, without cross-references, or new additions to the table of contents, you can use draft mode which will invoke LaTeX only once. It can save quite a lot of the compilation time:

make4ht -um draft -f html5 filename.tex

As with many things in the TeX universe, there are a lot of configuration options. I’m deliberately not covering them both to keep the post from ballooning in size and to avoid confusion; I’ll assume that you know where to find the documentation if you need it.

Items to research and conclusion

Using LaTeX as the source for documents presents some clear advantages and some interesting challenges. TeX and LaTeX were designed from the start to work as print typesetting languages and the quality of the printed result is clearly better than what we can get from HTML alone.

Particularly with make4ht there are many questions left to answer. Sone of the questions that merit additional research:

  • Would the output of the tool using the staticsite extension be good enough for static sites other than Jekyll?
  • Is the output in Tei and Docbook good enough to feed to their corresponding processing toolchains? If not what additional changes do we need to make?
  • Is it worth learning Lua just to automate one type of task for one tool?

Links and resources

Generating HTML and PDF from Markdown

Markdown is my favorite way to write. It is lightweight, requiring a few characters to convey the meaning of the text, it’s supported in both many places, including Github and WordPress (via Jetpack) so I don’t need to change the way I write to publish in different places and it only needs a text editor to create (for me, so does HTML but that’s another story). In this article, we’ll look at different ways to take Markdown input and convert it to HTML for web view.

Markdown to HTML in a build process

This process is part of the build for my writing process and covers both HTML and PDF generation from the same Markdown source.

I’ve created an HTML template to place the Markdown-produced HTML Inside of. It does three things:

  1. Defines the CSS that the document will load
  2. Defines the document metadata: charset, viewport, and title
  3. Defines the container and the placeholder for the generated HTML
  4. Defines the scripts we want the page to run at the bottom of the document. We could also place them on the head and use defer but we don’t really need to
<html lang="en" dir="ltr" class="no-js lazy">

<head>
  <!-- 1 -->
  <link rel="stylesheet" href="css/normalize.css">
  <link rel="stylesheet" href="css/main.css">
  <link rel="stylesheet" href="css/image-load.css">
  <link rel="stylesheet" href="css/video-load.css">
  <link rel="stylesheet" href="css/prism.css">
  <!-- 2 -->
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title></title>
</head>

<body>
<!-- 3 -->
<article class="container">
  <%= contents %>
</article>
<!-- 4 -->
<script src="scripts/lazy-load.js"></script>
<script src="scripts/vendor/clipboard.min.js"></script>
<script src="scripts/vendor/prism.js"></script>
<script src="scripts/vendor/fontfaceobserver.standalone.js"></script>
<script src="scripts/load-fonts.js"></script>
<script src="scripts/lazy-load-video.js"></script>
</body>
</html>

Before we run the build file we need to make sure that all the dependencies for Gulp are installed and updated. I’m lazy and haven’t updated the code to work with Gulp 4.0 so I’m sticking to 3.9 for this example.

npm install [email protected] gulp-newer gulp-remarkable \
gulp-wrap gulp-exec remarkable

The first step is to load the plugins as we would in any other Gulp file or Node project

const gulp = require('gulp'); // Gulp
const newer = require('gulp-newer'); // Newer
const markdown = require('gulp-remarkable'); // Markdown plugin
const wrap = require('gulp-wrap'); // Wrap
const exec = require('gulp-exec'); // Exec

Then we define the first task, markdown, to generate HTML from our Markdown sources.

We take all the Markdown files and, if they are newer than files in the target directory, we run them through the Remarkable Gulp Plugin.

gulp.task('markdown', () => {
  return gulp.src('src/md-content/*.md')
    .pipe(newer('src/html-content/'))
    .pipe(markdown({
      preset: 'commonmark',
      typographer: true,
      remarkableOptions: {
        typographer: true,
        linkify: true,
        breaks: false,
      },
    }))
    .pipe(gulp.dest('src/html-content/'));
});

Remarkable doesn’t generate full or well-formed docs, it just converts the Markdown into HTML and, since we don’t have a well-formed HTML document in Markdown (not what it was designed for), we only get the body of the document.

To make it into a well-formed HTML document we need to put the Markdown inside an HTML document. We use the gulp-wrap plugin to do so. The result is that for each Markdown file we converted to HTML we now have a well-formed HTML document with links to stylesheets and scripts ready to be put in production.

gulp.task('build-template', ['markdown'], () => {
  gulp.src('./src/html-content/*.html')
    .pipe(wrap({src: './src/templates/template.html'}))
    .pipe(gulp.dest('./src/'));
});

PDF? Why Not?

We can use a similar technique to generate PDF files from our content. We’ll leverage the same framework that we did for generating HTML with a different template and using third party tools.

We need to be careful not to insert HTML markup into the Markdown we want to use to generate PDF as the PDF generators tend to not be very happy with videos and, to my not very happy face, fail completely rather than ignoring the markup they don’t understand.

The template is smaller as it doesn’t require the same number of scripts and stylesheets.

Two things to note:

  • We’re using a different syntax highlighter (Highlight.js instead of Prism)
  • We chose not to add the stylesheet here
<html lang="en">

<head>
  <link rel="stylesheet" href="../paged-media/highlight/styles/solarized-light.css">
  <script src="../paged-media/highlight/highlight.pack.js"></script>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1">
  <title></title>
</head>

<body data-type="article">
<div class="container">
    <%= contents %>
</div>

</body>
</html>

The first step is to create the HTML files using the appropriate template after we generate the HTML from the Markdown content.

gulp.task('build-pm-template', () => {
  gulp.src('./src/html-content/*.html')
    .pipe(wrap({src: './src/templates/template-pm.html'}))
    .pipe(gulp.dest('./src/pm-content'));
});

The next step is where the differences lie. Instead of just generating the HTML and being done with it, we have to push the HTML through a CSS paged media processor.

I’ve used PrinceXML to generate PDF from multiple sources with different inputs (XML, HTML, and XSL-FO) so we’re sticking with it for this project.

I use a second stylesheet that has all the font definitions and styles for the document. I’ve made article-styles.css available as Github GIST

The final bit is how we run PrinceXML in the build process. I know that gulp-exec is not well liked in the Gulp community but none of the alternatives I’ve found don’t do quite what I needed to, so gulp-exec it is.

The idea is that, for each file in the source directory, we run prince with the command given.

gulp.task('build-pdf', ['build-pm-template'], () => {
  return gulp.src('./src/pm-content/*.html')
    .pipe(newer('src/pdf/'))
    .pipe(exec('prince --verbose --input=html --javascript --style ./src/css/article-styles.css <%= file.path %> '))
    .pipe(exec.reporter());
});

So we’ve gone from Markdown to HTML and Markdown to PDF. A next step may be how we can populate Handlebar or Dust templates from our Markdown sources.