Skip to main content
Dublin Library

The Publishing Project

Blocking Google from indexing your site

 

There are times when you don't want Google search to index specific pages on your site. It may be because these are private pages that, for some reason, are not password protected; or the site needs to be available to the client during development and you don't want to put it behind a password.

This post will present two ways of blocking Google crawlers from indexing your site: server response headers and client-side meta tags.

Using Server Response Headers #

The easiest way to add the headers is to do so in your server configuration. To disable search engine indexing globally you can add the following header to your server's configuration.

X-Robots-Tag: noindex

The specific way of doing this will depend on what server you're running.

For Apache servers, make sure that mod_headers is installed.

Headers can be added in the following places:

  • The default server configuration
    • Adding it here makes the header global
  • Virtual host configuration
  • Directory
  • Location

The header that will prevent Google from indexing is:

Header set X-Robots-Tag: "noindex"

Nginx setup is a little more complex.

add_header adds the specified field to a response header provided that the response code equals one of the following codes:

  • 200
  • 201
  • 204
  • 206
  • 301
  • 302
  • 303
  • 304
  • 307
  • 308

add_header directives are inherited from the previous configuration level if and only if there are no add_header directives defined on the current level.

If the optional always parameter is specified, the header field will be added regardless of the response code.

add_header can be added in the following contexts: http, server, location, if (when inside a location, but see If is Evil… when used in location context)

add_header X-Robots-Tag: "noindex";

Using meta tags #

If you have access to the server and are comfortable changing the configuration, then please use it. It's one less thing to remember and it'll work better in the long run.

But developers don't always have access to the server or are not comfortably changing the server's configuration.

We can still prevent indexing using meta tags in the head of the pages we want to crawlers to skip.

To prevent most search engines from indexing a page on your site, place the following meta tag into the head section of your page:

<meta name="robots" content="noindex">

To prevent only Google web crawlers from indexing a page:

<meta name="googlebot" content="noindex">

Note that googlebot is the name of the Google search engine crawler. It is not the only crawler that Google uses to crawl your content.

So what Google crawler do we want to stop indexing our content? #

Google uses multiple crawlers to navigate through your content. Most website developers and owners only deal with the Googlebot, the main Google crawler, and, usually, this is fine.

However, there may be times when there are specific kinds of items that you want to block without affecting the basic Googlebot settings.

Name User Agent Token Full User Agent String Notes
APIs-Google APIs-Google APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html)
AdsBot Mobile Web Android AdsBot-Google-Mobile Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html) AdsBot Mobile Web Android ignores the * wildcard.

Checks Android web page ad quality.
AdsBot Mobile Web AdsBot-Google-Mobile Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html) AdsBot Mobile Web ignores the * wildcard.

Checks iPhone web page ad quality.
AdsBot-Google AdsBot-Google (+http://www.google.com/adsbot.html) AdsBot ignores the * wildcard.

Checks desktop web page ad quality.
AdSense Mediapartners-Google Mediapartners-Google The AdSense crawler visits your site to determine its content in order to provide relevant ads. Ignores the global user agent (*) in robots.txt.
Googlebot Image Googlebot-Image

Googlebot
Googlebot-Image/1.0 Used for crawling image bytes for Google Images and products dependent on images.
Googlebot News Googlebot-News

Googlebot
The Googlebot-News user agent uses the various Googlebot user agent strings. Googlebot News uses Googlebot for crawling news articles, however it respects its historic user agent token Googlebot-News.
Googlebot Video Googlebot-Video

Googlebot
Googlebot-Video/1.0 Used for crawling video bytes for Google Video and products dependent on videos.
Googlebot Desktop Googlebot Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36

Googlebot/2.1 (+http://www.google.com/bot.html)
Googlebot Smartphone Googlebot Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mobile AdSense Mediapartners-Google (Various mobile device types) (compatible; Mediapartners-Google/2.1; +http://www.google.com/bot.html)
Feedfetcher FeedFetcher-Google FeedFetcher-Google; (+http://www.google.com/feedfetcher.html) Caution: Feedfetcher doesn't respect robots.txt rules.
Google Read Aloud Google-Read-Aloud Current agents:

Desktop agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)

Mobile agent:

Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)

Former agent (deprecated):

google-speakr
Caution: Google Read Aloud doesn't respect robots.txt rules.
Google StoreBot Storebot-Google Desktop agent:

Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36

Mobile agent:

Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36
Google Site Verifier Google-Site-Verification Mozilla/5.0 (compatible; Google-Site-Verification/1.0) Caution: Google Site Verifier ignores robots.txt rules.

In conclusion. Yes, it is possible to prevent crawlers in general and Google crawlers in particular from indexing your site and showing them in your results.

However, I'm not certain how long it'll take for the crawler to index your site after you remove the meta tag or the response header if it will do it at all.

Edit on Github