Skip to main content
Dublin Library

The Publishing Project

XML vs. JSON - Is XML Still Relevant??

 

XML (eXtensible Markup Language) has been a foundational technology for data representation and exchange since its inception in the late 1990s. However, with the rise of alternative data formats like JSON (JavaScript Object Notation) and YAML (YAML Ain't Markup Language), many developers question whether XML remains relevant in today's technology landscape.

This post explores the current relevance of XML, its strengths and weaknesses, the modern landscape of XML technologies, and the scenarios where it continues to be the superior choice.

What is XML? #

XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It stores and transports data, allowing creators to define custom tags that describe the structure and meaning of that data.

This structure facilitates data interchange between different systems and platforms. It also allows for the creation of Domain-Specific Languages (DSLs) tailored to specific industries.

For example, where JSON might represent a book description like this:

{
  "book": {
    "title": "XML Fundamentals",
    "author": "Jane Doe"
  }
}

The same data in XML looks like this:

<book>
  <title>XML Fundamentals</title>
  <author>Jane Doe</author>
</book>

We can enrich the book data by adding tags like <publisher>, <year>, and <ISBN>, or add semantic depth with nested tags:

<book>
  <title>XML Fundamentals</title>
  <edition>First</edition>
  <author>
    <first-name>Jane</first-name>
    <last-name>Doe</last-name>
  </author>
  <publication-details>
    <publisher>Tech Books Publishing</publisher>
    <year>2020</year>
    <ISBN>123-4567890123</ISBN>
  </publication-details>
</book>

We can validate the structure and content of these documents using XML Schema, DTD, or Relax NG, ensuring the data strictly adheres to predefined rules.

Strengths #

  • Flexibility: You can define custom tags to represent highly complex or recursive data structures.
  • Extensibility: You can extend the format to accommodate new requirements without breaking existing parsers.
  • Validation: XML supports robust, contract-based validation mechanisms (XSD, Schematron).
  • Mixed Content: Unlike JSON, XML excels at "mixed content"—text interspersed with markup (e.g., This is <b>bold</b> text), making it superior for document-centric data.

Weaknesses #

  • Verbosity: XML documents are typically larger than JSON or YAML equivalents, increasing storage and transmission costs.
  • Complexity: The learning curve for the full XML stack (Namespaces, XSLT, Schema) is steep.
  • Performance: XML parsers are generally slower and more memory-intensive than JSON parsers.
  • Tooling Shift: Modern web tooling prioritizes JSON. While XML tooling is mature, it often feels "enterprise" or legacy compared to the vibrant JavaScript ecosystem.

Current Landscape of XML Technologies #

Foundational Technologies #

XML applications rely on several core technologies that enhance their capabilities.

  • XML Namespaces: Namespaces qualify elements and attributes to avoid naming conflicts. This allows you to combine XML vocabularies from different sources into a single document without ambiguity.
  • XPath: XPath is a syntax for selecting nodes or computing values within an XML document. It is the query language used to navigate the tree structure and is heavily utilized by other technologies like XSLT and XQuery.

Definition Languages #

These languages define the "contract" that an XML document must fulfill.

DTD (Document Type Definition) #

The original method for defining XML structure. It specifies allowed elements and attributes but lacks support for data types (like integers or dates) and namespaces.

<!DOCTYPE library [
  <!ELEMENT library (book*)>
  <!ELEMENT book (title, author, year, genre?)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT author (#PCDATA)>
  <!ELEMENT year (#PCDATA)>
  <!ELEMENT genre (#PCDATA)>
  <!ATTLIST book
    id CDATA #REQUIRED
    available (yes|no) "yes"
  >
]>

XML Schema (XSD) #

XSD serves as a comprehensive grammar for XML. Unlike DTD, XSD is written in XML syntax and supports strong typing (dates, booleans, numbers), namespaces, and complex constraints.

Here is the full conversion of the DTD above into XML Schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified">

  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="book" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="title" type="xs:string"/>
        <xs:element name="author" type="xs:string"/>
        <xs:element name="year" type="xs:string"/>
        <xs:element name="genre" type="xs:string" minOccurs="0"/>
      </xs:sequence>

      <xs:attribute name="id" type="xs:string" use="required"/>
      <xs:attribute name="available" default="yes">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="yes"/>
            <xs:enumeration value="no"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

</xs:schema>

Relax NG #

RELAX NG focuses on simplicity and mathematical elegance. It supports both an XML syntax and a Compact Syntax (RNC). Many developers prefer it for its readability and its ability to handle mixed content more easily than XSD.

The example below uses the Compact Syntax:

start = library

library = element library { book* }

book = element book {
  attlist,
  element title { text },
  element author { text },
  element year { text },
  element genre { text }?
}

attlist =
  attribute id { text } &
  ## The "available" attribute is optional (?) because it has a default value in the DTD logic
  attribute available { "yes" | "no" }?

Query and Transformation Languages #

This is where XML truly shines. The ability to transform and query data with standardized languages is a key differentiator from JSON.

XSLT (eXtensible Stylesheet Language Transformations) #

XSLT is a functional language designed to transform XML into other formats (HTML, PDF, JSON, or other XML).

  • XSLT 1.0: The original version (1999). It is widely supported (including in browsers) but lacks advanced features like regular expressions and custom functions.
  • XSLT 2.0: A massive leap forward (2007). It introduced strong typing (integers, dates), user-defined functions (xsl:function), regular expressions, and the ability to output multiple files from a single source.
  • XSLT 3.0: The modern standard (2017). It adds streaming (processing files larger than memory), higher-order functions, and full JSON support (transforming JSON to XML and vice-versa).

XSLT in Browsers

Browsers generally only support XSLT 1.0. For modern XSLT 2.0/3.0 features, you must use server-side processors like Saxon (Java/JS/Node) or process the transformations on the server before displaying the changes.

Browsers are removing built-in XSLT support altogether, so relying on client-side XSLT is not future-proof since browsers only support XSLT 1.0 and even that old support is being taken away.

XQuery #

XQuery is to XML what SQL is to databases. It is designed to extract, manipulate, and reconstruct XML data.

  • XQuery 1.0: The base recommendation (2007), providing FLWOR expressions (For, Let, Where, Order by, Return).
  • XQuery 2.0: This version was skipped to align version numbers with XSLT and XPath.
  • XQuery 3.0: Added "Group By" clauses, "Try/Catch" error handling, and Windowing clauses for complex data analysis.
  • XQuery 3.1: The latest version (2017). It introduced Maps and Arrays, allowing XQuery to query and generate JSON just as easily as XML.

The following example uses XQuery 3.0 to group books by genre and calculate statistics:

xquery version "3.0";

<library_report>
{
  (: FOR: Iterate over books :)
  for $book in doc("library.xml")/library/book

  (: LET: Handle missing genres by defaulting to "Unclassified" :)
  let $genre_key := if ($book/genre) then $book/genre else "Unclassified"

  (: WHERE: Filter for available books only :)
  where $book/@available = 'yes'

  (: GROUP BY: XQuery 3.0 Feature :)
  group by $genre_key

  (: ORDER BY: Sort the groups alphabetically :)
  order by $genre_key ascending

  return
    <genre category="{ $genre_key }">
      <stats>
        <count>{ count($book) }</count>
        <average_year>{ avg($book/year) }</average_year>
      </stats>
      <book_list>
      {
        (: XQuery 3.0 Feature: Simple Map Operator (!) :)
        $book ! <item id="{ ./@id }">{ ./title/text() || " (" || ./author || ")" }</item>
      }
      </book_list>
    </genre>
}
</library_report>

Processing #

XProc #

XProc defines XML pipelines. It chains multiple steps (validate -> transform -> store) into a single, automated workflow.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
  version="3.0"
  name="docbook-to-html">

  <p:input port="source" primary="true"/>
  <p:output port="result" primary="true" serialization="map{'method':'html', 'version':'5'}"/>

  <p:option name="css-filename" select="'theme.css'"/>

  <p:store href="{$css-filename}" method="text">
    <p:with-input>
      <p:inline content-type="text/css">
        body { font-family: sans-serif; margin: 2rem; }
        h1, h2, h3 { color: #2c3e50; }
      </p:inline>
    </p:with-input>
  </p:store>

  <p:xslt name="transform">
    <p:with-input port="stylesheet">
      <p:document href="https://cdn.docbook.org/release/xsl-ng/current/xslt/docbook.xsl"/>
    </p:with-input>
    <p:with-param name="html.stylesheet" select="$css-filename"/>
  </p:xslt>

</p:declare-step>

The "Hidden" XML You Use Every Day #

While JSON dominates web APIs, XML quietly powers the files and graphics we use daily. It is not just a legacy format; it is the active infrastructure for documents and media.

Scalable Vector Graphics (SVG) #

If you use an icon on a website or a chart in a dashboard, you are likely using XML. SVG is an XML-based vector image format. Because it is XML, it integrates directly into the web's DOM (Document Object Model), allowing developers to manipulate graphics with CSS and JavaScript—something impossible with binary image formats like PNG or JPEG.

<svg width="100" height="100" xmlns="http://www.w3.org/2000/svg">
  <circle cx="50" cy="50" r="40" stroke="black" stroke-width="3" fill="red" />
</svg>

Office Documents (OOXML and ODF) #

Modern office documents—such as Microsoft Word (.docx), Excel (.xlsx), and LibreOffice (.odt)—are not binary blobs. They are actually Zipped archives containing a collection of XML files.

If you rename a .docx file to .zip and unzip it, you will find a folder structure driven by XML. The text of your document lives in word/document.xml. This architecture allows organizations to programmatically generate reports or extract data from spreadsheets without needing the Office software installed.

RSS and Atom Feeds #

The backbone of content syndication remains XML. Podcasts, news aggregators, and blog readers rely on RSS (Really Simple Syndication) or Atom feeds.

Podcasting, in particular, relies heavily on XML namespaces. Apple Podcasts, for instance, extends the standard RSS XML with their own iTunes namespace to add cover art, duration, and explicit content flags.

<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
  <channel>
    <title>Tech Talk</title>
    <item>
      <title>Is XML Dead?</title>
      <itunes:duration>32:00</itunes:duration>
    </item>
  </channel>
</rss>

Use Cases Where XML Excels #

Is XML still relevant? Yes. While JSON has won the war for web APIs, XML remains the standard for document-centric data, complex schemas, and vector graphics.

Document Publishing (TEI, DocBook, DITA) #

Structured authoring is XML's stronghold.

  • TEI: The standard for digital humanities and historical texts.
  • DocBook & DITA: The standard for technical documentation. They allow "Single Source Publishing," where one XML source generates PDF, HTML, and eBook formats simultaneously.

Industry-Specific Standards #

Industries dealing with long-term archival or strict regulatory compliance prefer XML for its human readability and schema validation capabilities.

  • Scientific Research: SBML (Systems Biology) and CML (Chemistry).
  • Healthcare: HL7 and CDA for patient records.
  • Finance: FIXML and FpML for complex trading data.

Conclusion #

While XML is no longer the default for simple data interchange in web apps, it is far from dead. Its relevance has shifted from "universal format" to "specialist tool."

For lightweight APIs, use JSON. But for document publishing, complex regulatory data, vector graphics, and scenarios requiring strict validation or mixed content, XML remains the undisputed champion.

Edit on Github