Semantics and HTML
HTML documents don’t have many features to include additional semantics. With (X)HTML we could use namespaces but nowadays everyone seems to hate anything related to XML and the [functionality of namespaces](https://www.w3.org/TR/html5/infrastructure.html#namespaces) seems to have changed considerably since [XHTML 1.0](https://www.w3.org/TR/xhtml1/). But there are ways to add semantics to an HTML page through custom attributes, extensions to HTML or a combination of both. We’ll explore some of the macine readable semantic additions to HTML: - Microformats, - RDFa Lite - JSON-LD - Microdata - data- attributes We’ll also explore some semantics derived from WAI and WAI-ARIA specifications along with their digital publishing extensions. Regardless of the format we choose the real trick of working with structured data if figuring out how to match our HTML to the vocabulary and format we’ve chosen to work with. I’ll discuss what vocabulary to use in a later section. ## Microformats [Microformats](http://microformats.org/) are the oldest semantic additions to the HTML standard to indicate semantic meaning. ### h-card Take the example below, a piece of straightforward markup for describing a person and information about him: ``` Jane Doe
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
Humans can guess about the content of the div. Some of the inferences we can make and that we’ll assume are correct, are:
- John Doe is a person
- The information in the div refers to John Doe
- He lives in Reykjavik, Iceland
- His email address is `John.Doe@somewhere.com`
A machine, however smart, cannot make that kind of guesses without support. That’s where [h-card](http://microformats.org/wiki/h-card) comes in. It provides a vocabulary that uses class attributes to further describe the information it’s attached to.
One thing that HTML purists object to is that we may need to add additional markup to existing content but most of the markup is semantically neutral `span` tags so I’m OK with that.
Let’s look at what the example looks like marked up:
```
Jane Doe
Professor
20341 Whitworth Institute
405 Whitworth
Seattle
WA
98052
(425) 123-4567
jane-doe@illinois.eduJane's home page:
janedoe.com Graduate students: Alice Jones Bob Smith ``` Even though we added a lot of semantic meaning to our person the markup is invisible to the user; search engines see all the additional information and act upon it. If we look at the description for `u-photo` in the Wiki we see that it doesn’t have a description so it’s safe to asume that since it starts with a `u` it’s a URL and the `p` URLs refer to a person. Addresses need a little more attention. If you look at the `p.adr` entry for h-card you’ll see that it suggests that you can `optionally embed an h-adr`. At first it threw me off until I looked at `h-adr` and what it would be used for. For individuals it is optional as most individuals have one address. If this was a professional bio we could have more than one address, say one for home and one for work and it would look something like this: ```17 Austerstræti Reykjavík Iceland 107
``` The address also shows how we would add tags for semantic meaning to specific portions of the address. This is a common technique for adding semantics to specific parts of content. We need to make sure that whenever we add semantic attributes to content we do so appropriately;most of the time this will mean adding containers to the element (`` or `` tags) to put the semantics on.
The Micro Formats wiki provides a list of [h-card Properties](http://microformats.org/wiki/h-card#Properties). Handy when you're adding h-card properties to your document.
## Microdata
We will use the following example to illustrate the implementation of different structured data vocabularies.
```
Jane Doe
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
Microdata is part of WHATWG HTML living standard but a separated by the W3C, I guess, to make it easier to update and implement without having to update the full HTML specification. I chose to follow the W3C implementation.
The Microdata marked example looks like this:
```
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
The example marked up for RDFa is copied below. Compare the result below with the same document marked up wth Microdata. Also note the level of granularity which is similar to what we did with Microformats to begin the exercise.
```
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
The JSON-LD script looks like this:
```
```
Note that we attach the JSON-LD metadata to a script tag with `application/ld+json`. Because we can’t add JSON directly to HTML we’ve created an external source for the content.
The next difference is that we use JSON syntax but we keep the camel case for all attribute names. The names are the same as those in schema, this makes it easier to track what each propery does accross W3C defined schema types.
# Custom Attributes in your HTML
HTML5 introduced custom `data-` attributes as a way to add arbitrary data to be consumed by the page it’s contained in. More specifically:
> Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements. These attributes are not intended for use by software that is independent of the site that uses the attributes. Every HTML element may have any number of custom data attributes specified, with any value. [From HTML 5.1 specification](http://w3c.github.io/html/dom.html#embedding-custom-non-visible-data-with-the-data-attributes)
An example of data attributes looks like this:
```
```
## What can you use them for?
We can store data to use in both our CSS and Javascript. Some examples include:
- Storing the initial attributes of an element (height, width, opacity) which might be required in later JavaScript or CSS animations
- Adding attributes that will feed scripts in the page
## What shouldn’t I use them for?
While they give a lot of flexibility, custom attributes can be easily abused. Some of the htings we should avoid are:
- Using data attributes as replacements for microformats. Since data attributes can only be used with styles and scripts within the same page we cannot rely on them for information exchange
- Relying on a data attribute being present or absent to style content.
# Accessibility semantics
There are two elements that add semantic meaning to the text they are attached to: `aria-labelledby` and `aria-describedby`. While not strictly structured data formats they help provide structure to our HTML content so I’m including them here.
## aria-labelledby
This attribute provides a link to a short description or label for the element for assistive technology use. We create the link by using the id of an existing element on the page. Assistive technology such as screen readers use the text of the element identified by the value of the aria-labelledby attribute as the text alternative for the element with the attribute.
For images this does not replace the `alt` attribute. You should not omit it nor make it empty (`alt=""`) and should contain the same text than the target of the labelledby attribute.
```
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
Microdata is part of WHATWG HTML living standard but a separated by the W3C, I guess, to make it easier to update and implement without having to update the full HTML specification. I chose to follow the W3C implementation.
The Microdata marked example looks like this:
```
Jane Doe
Professor
```
The first elements to notice are `itemscope` and `itemtype`.
The `itemscope` attribute specified creates a new item, a group of name-value pairs until the next `itemscope` attributes appears.
The `itemtype` attribute, if specified, must contain a URLs pointing to a valid vocabulary, in this case the [Person](https://schema.org/Person) vocabulary from [schema.org](https://schema.org/). Inside our Person object we have another scoped item, a [PostalAddress](http://schema.org/PostalAddress) object defined in schema.org
For all children elements we specify an `itemprop` attribute to add one or more properties to the specified item.
## RDFa Lite
RDFa Lite is a W3C specification that uses a vocabulary described in schema.org to describe a set of vocabularies. It is very similar to Microdata
We will use the same example below for RDFa Lite, JSON-LD and Microdata. The basic example looks like this:
```
Jane Doe
Professor
20341 Whitworth Institute
405 N. Whitworth Seattle, WA 98052
(425) 123-4567
jane-doe@xyz.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
405 N. Whitworth Seattle, WA 98052
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
The example marked up for RDFa is copied below. Compare the result below with the same document marked up wth Microdata. Also note the level of granularity which is similar to what we did with Microformats to begin the exercise.
```
Jane Doe
Professor
```
Since HTML5 does not support namespaces RDFa uses custom attributes to achieve the same goal. `vocab` indicates the URL we want to use and `typeof` indicates the specific object we are describing. Once we do that the search engine knows what all these properties are about and can parse them properly.
`property` tells the RDFa parser the type of object it’s attached to.
We’ll look at the nested div containing our address to see how the markup changes based on the type of content we are creating. The `propety ` is an address and its `typeof` is `PostalAddress`.
Here we also use span tags to refined what portions of the content get what semantic meaning. The span tag is neutral: “The \\ element is a generic inline container for phrasing content, which does not inherently represent anything.”
```
Professor
20341 Whitworth Institute
405 N. Whitworth Seattle, WA 98052
(425) 123-4567
jane-doe@xyz.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
405 N. Whitworth Seattle, WA 98052
20341 Whitworth Institute
405 N. Whitworth Seattle, WA 98052
```
## JSON-LD
[JSON for linked data](http://json-ld.org/) or JSON-LD is an extension to the JSON format that allows data from different context or domains to create the metadata.
We’ll start with a definition of linked data, taken from [json-ld.org](http://json-ld.org/):
> Linked Data empowers people that publish and use information on the Web. It is a way to create a network of standards-based, machine-readable data across Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.
One again, we’re using the same data we used for Microdata and RDFa. The HTML looks like this:
```
Jane Doe
405 N. Whitworth Seattle, WA 98052
Professor
20341 Whitworth Institute
405 Whitworth
Seattle WA 98052
(425) 123-4567
jane-doe@illinois.edu
Jane's home page:
janedoe.com
Graduate students:
Alice Jones
Bob Smith
```
The JSON-LD script looks like this:
```
```
Note that we attach the JSON-LD metadata to a script tag with `application/ld+json`. Because we can’t add JSON directly to HTML we’ve created an external source for the content.
The next difference is that we use JSON syntax but we keep the camel case for all attribute names. The names are the same as those in schema, this makes it easier to track what each propery does accross W3C defined schema types.
# Custom Attributes in your HTML
HTML5 introduced custom `data-` attributes as a way to add arbitrary data to be consumed by the page it’s contained in. More specifically:
> Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements. These attributes are not intended for use by software that is independent of the site that uses the attributes. Every HTML element may have any number of custom data attributes specified, with any value. [From HTML 5.1 specification](http://w3c.github.io/html/dom.html#embedding-custom-non-visible-data-with-the-data-attributes)
An example of data attributes looks like this:
```
Content
Van Gogh's oil painting of sunflowers hangs in Amsterdam's Van Gogh museum.
``` ## aria-describedby aria-describedby is a complementary attribute to `lablledby`. Where labelledby points to an ID of content inside your pages `describedby`addresses a longer description in your document. In the example below we associate form elements with divs hosting the content. These descriptions are much longer than the labels indicated with `labelledby`. ```
Was emailed to you when you signed up
Introduction
Foreword: Lorem Ipsum
Part I: Getting Started
Chapter 1: The Tools
Chapter 2: The environment
Appendix A: References
Example