Blocking Google from indexing your site

December 14, 2022
4 min. read

There are times when you don't want Google search to index specific pages on your site. It may be because these are private pages that, for some reason, are not password protected; or the site needs to be available to the client during development and you don't want to put it behind a password.

This post will present two ways of blocking Google crawlers from indexing your site: server response headers and client-side meta tags.

Using Server Response Headers #

The easiest way to add the headers is to do so in your server configuration. To disable search engine indexing globally you can add the following header to your server's configuration.

X-Robots-Tag: noindex

The specific way of doing this will depend on what server you're running.

For Apache servers, make sure that mod_headers is installed.

Headers can be added in the following places:

The default server configuration
- Adding it here makes the header global
Virtual host configuration
Directory
Location

The header that will prevent Google from indexing is:

Header set X-Robots-Tag: "noindex"

Nginx setup is a little more complex.

add_header adds the specified field to a response header provided that the response code equals one of the following codes:

add_header directives are inherited from the previous configuration level if and only if there are no add_header directives defined on the current level.

If the optional always parameter is specified, the header field will be added regardless of the response code.

add_header can be added in the following contexts: http, server, location, if (when inside a location, but see If is Evil… when used in location context)

add_header X-Robots-Tag: "noindex";

Using meta tags #

If you have access to the server and are comfortable changing the configuration, then please use it. It's one less thing to remember and it'll work better in the long run.

But developers don't always have access to the server or are not comfortably changing the server's configuration.

We can still prevent indexing using meta tags in the head of the pages we want to crawlers to skip.

To prevent most search engines from indexing a page on your site, place the following meta tag into the head section of your page:

<meta name="robots" content="noindex">

To prevent only Google web crawlers from indexing a page:

<meta name="googlebot" content="noindex">

Note that googlebot is the name of the Google search engine crawler. It is not the only crawler that Google uses to crawl your content.

So what Google crawler do we want to stop indexing our content? #

Google uses multiple crawlers to navigate through your content. Most website developers and owners only deal with the Googlebot, the main Google crawler, and, usually, this is fine.

However, there may be times when there are specific kinds of items that you want to block without affecting the basic Googlebot settings.

Name	User Agent Token	Full User Agent String	Notes
APIs-Google	APIs-Google	APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)
AdsBot Mobile Web Android	AdsBot-Google-Mobile	Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)	*AdsBot Mobile Web Android ignores the `` wildcard**. Checks Android web page ad quality.
AdsBot Mobile Web	AdsBot-Google-Mobile	Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)	*AdsBot Mobile Web ignores the wildcard**. Checks iPhone web page ad quality.
	AdsBot-Google	AdsBot-Google (+http://www.google.com/adsbot.html)	*AdsBot ignores the wildcard**. Checks desktop web page ad quality.
AdSense	Mediapartners-Google	Mediapartners-Google	The AdSense crawler visits your site to determine its content in order to provide relevant ads. Ignores the global user agent (*) in robots.txt.
Googlebot Image	Googlebot-Image Googlebot	Googlebot-Image/1.0	Used for crawling image bytes for Google Images and products dependent on images.
Googlebot News	Googlebot-News Googlebot	The Googlebot-News user agent uses the various Googlebot user agent strings.	Googlebot News uses Googlebot for crawling news articles, however it respects its historic user agent token Googlebot-News.
Googlebot Video	Googlebot-Video Googlebot	Googlebot-Video/1.0	Used for crawling video bytes for Google Video and products dependent on videos.
Googlebot Desktop	Googlebot	Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36 Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot Smartphone	Googlebot	Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mobile AdSense	Mediapartners-Google	(Various mobile device types) (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.html)
Feedfetcher	FeedFetcher-Google	FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)	Caution: Feedfetcher doesn't respect robots.txt rules.
Google Read Aloud	Google-Read-Aloud	Current agents: Desktop agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) Mobile agent: Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) Former agent (deprecated): google-speakr	Caution: Google Read Aloud doesn't respect robots.txt rules.
Google StoreBot	Storebot-Google	Desktop agent: Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 Mobile agent: Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36
Google Site Verifier	Google-Site-Verification	Mozilla/5.0 (compatible; Google-Site-Verification/1.0)	Caution: Google Site Verifier ignores robots.txt rules.

In conclusion. Yes, it is possible to prevent crawlers in general and Google crawlers in particular from indexing your site and showing them in your results.

However, I'm not certain how long it'll take for the crawler to index your site after you remove the meta tag or the response header if it will do it at all.

Links and Resources #

Robots meta tag, data-nosnippet, and X-Robots-Tag specifications
Block Search indexing with noindex
Overview of Google crawlers (user agents)
Googlebot. Google's main search crawler

Edit on Github

Populating a theme for testing Using media queries in Javascript