Sanitizing HTML content

November 2, 2022
5 min. read

One of the biggest security issues with web applications is Cross Site Scripting (XSS). In an XSS attack, malicious code is added to HTML that we expect the browser to parse, thus rendering and executing the malicious code on the page. Let's assume that we have the following template. ```js foreach(review in reviews) {

${review.title}

${review.text}

} ``` And then we feed it the following data: ```text // Review 1 Title: Friendly and delicious! Text: The Restaurant is right in the center of town. It has top food en it's a very nice place with a friendly and professional staff //Review 2 Title: Kitchen nightmares Text: Fine location, lots of parking. But those are the only good things.

``` Without any protection the template will render the HTML: ```html

Friendly and delicious!

The restaurant is right in the center of town...

Kitchen nightmares

Fine location, ... things.

``` The second template will produce an error and the `onerror` event handle will fire, alerting us that we're running malicious code. In this example, we're just using an alert. In really malicious code this could be used to exfiltrate sensitive data or other malicious activity. The browser will not do anything to prevent it because we haven't told it to. We will look at three different ways to sanitize external content that we feed to our templates and APIs, in essence, telling the browser to filter potentially malicious content from templates. The idea behind all these techniques is to sanitize the content and remove all tags and script elements from the content. ## third-party libraries: DOMPurify The first option is to use sanitizer libraries like [DOMPurify](https://github.com/cure53/DOMPurify#readme). The idea is that you import the DOMPurify library and then call the `sanitize` method on the text that we want to "clean up". Using the default settings, the code to sanitize text would look like this: ```js const dirty = '

HELLO<br>goodbye

'; const clean = DOMPurify.sanitize(dirty); document.getElementById('sanitized').innerHTML = clean; ``` You can also create a configuration object to let some elements through the sanitizer. The example does the following: 1. Defines the dirty text 2. Specify a configuration directive, only <p> elements allowed * We want to also keep <p>'s text content, so we add #text too 3. Sanitize the input 4. place the sanitized input in the document {.custom-ordered} ```js // 1 const dirty = '

HELLO