SEO Guide: List of robot meta tags to identify various crawlers individually

SEO Guide: List of robot meta tags to identify various crawlers individually

Google displays your material in search results by adjusting page and text-level parameters.

Now, the challenge lies in how to change the way Google displays your material. A meta element on HTML pages or an HTTP header can be used to indicate page-level settings. The data-nosnippet property on HTML elements inside a page can be used to provide text-level settings.

Keep in mind that Google crawlers can only read and obey these settings if crawlers are authorised to visit the sites that contain them.

The tag or directive meta name=”robots” content=”noindex” /> is for search engine crawlers. You may need to include directives specific to non-search crawlers, such as AdsBot-Google, to prevent them (for example, meta name=”AdsBot-Google” content=”noindex” />).

Use of the robots meta tag:

The robots meta tag allows you to take a more in-depth, page-by-page approach to regulating how a website is indexed and served in Google Search results. Put the robots meta tag in the head section of a page, like:

<!DOCTYPE html>
<html><head>
<meta name="robots" content="noindex" />
(…)
</head>
<body>(…)</body>
</html> 

In the example above, the robots meta tag tells search engines not to show the page in search results. The value of the name credits (robots) indicates that the command is applicable to all crawlers. To address a specific crawler, change the name attribute’s robots value to the name of the crawler you’re addressing.

User agents are the names given to certain crawlers (a crawler uses its user agent to request a page.) Googlebot is the user agent for Google’s basic web crawler. Update the tag as follows to prevent just Googlebot Google Newsfrom crawling your page:

<meta name="googlebot" content="noindex" />

This tag now expressly tells Google not to display this website in its search results. The content and name properties are not case sensitive.

Crawlers for diverse purposes may exist in search engines. The entire list of Google’s crawlers may be seen here. Use the following meta tag to show a page in Google’s online search results but not in Google News:

<meta name="googlebot-news" content="noindex" /> 

Use several robots meta tags to identify various crawlers individually:

<meta name="googlebot" content="noindex">
<meta name="googlebot-news" content="nosnippet"> 

Using the X-Robots-Tag HTTP header

The X-Robots-Tag can be included in an HTTP header response for a specific URL. An X-Robots-Tag can include any directive that can be used in a robots meta tag. An HTTP response containing an X-Robots-Tag telling crawlers not to index a page looks like this:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)

Multiple X-Robots-Tag headers can be concatenated in an HTTP response, or a comma-separated list of directives can be specified. An HTTP header response with a noarchive X-Robots-Tag coupled with an unavailable after X-Robots-Tag is seen below.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST
(…) 

Before the directives, the X-Robots-Tag might optionally provide a user agent. The X-Robots-Tag HTTP headers, for example, can be used to conditionally enable a website to appear in search results for multiple search engines:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
(…) 

All crawlers will be able to use directives supplied without a user agent. There is no case difference between the HTTP header, the user agent name, and the supplied values.

Conflicting robots directives: In the case of conflicting robots directives, the more restrictive directive applies. For example, if a page has both max-snippet:50 and nosnippet directives, the nosnippet directive will apply.

Directives for indexing

The robots meta tag and the X-Robots-Tag may be used to manage indexing and serving of a snippet using the following instructions. A snippet is a little piece of text that appears in search results to show how relevant a document is to the user’s query. The list below lists all of the directions that Google honours, as well as their meanings. Each value corresponds to a distinct command. A comma-separated list of directives can be used to combine several directives. These directions don’t care about case.

Note: It is possible that these directives may not be treated the same by all other search engines.

List of Directives

all 

all: Indexing and serving are not restricted. This command has the default value, and if it is not explicitly stated, it has no impact.

noindex

noindex: This page should not appear in search results. If this directive is not specified, the page may be indexed and shown in search results.

nofollow

nofollow: This page should not appear in search results like the noindex directive.

none

none: Same as Noindex, nofollow

noarchive

noarchive: Cached link should never be displayed in search results. If you don’t utilise this directive, Google may create a cached page that people may land on via search results.

nosnippet

nosnippet: This page’s search results should not include a text snippet or a video sample. When it results in a better user experience, a static picture thumbnail (if available) may still be shown. This is true for all types of search results (at Google: web search, Google Images, Discover). If you don’t specify this directive, Google will create a text snippet and a video preview depending on the page’s content.

max-snippet: [number]

max-snippet: As a textual sample for this search result, use a maximum of [number] characters. (Please note that a URL may appear numerous times on a search results page.) Image and video previews are unaffected. This is true for all types of search results (such as Google web search, Google Images, Discover, Assistant). This restriction does not apply in situations where a publisher has provided authorization for the use of material separately. This setting does not interfere with more specific authorised uses, such as if the publisher provides content in the form of in-page structured data or has a Google licencing agreement. If no parseable [number] is given, this directive is ignored.

Examples:

To prevent a snippet from appearing in search results, do the following:

<meta name="robots" content="max-snippet:0">

To display up to 20 characters in the snippet, type:

<meta name="robots" content="max-snippet:20">

To indicate that the amount of characters that can be presented in the snippet has no limit:

max-image-preview: [setting]

max-image-preview: Set the maximum size of an image preview in a search results page for this page. If the max-image-preview directive is not specified, Google may display an image preview of the default size.

Acceptable [setting] values include:

-none: There will be no picture preview.
-standard: An picture preview may be displayed by default.
-large: A bigger picture preview may be displayed, up to the width of the viewport. For example:

<meta name=”robots” content=”max-image-preview:standard”>

max-video-preview: [number]

max-video-preview: Use a video sample of not more than [number] seconds in search results for videos on this page. If you don’t use the max-video-preview directive, Google may display a video snippet in search results, and you’ll have to let Google decide how long the preview should be.

This is true for all types of search results (at Google: web search, Google Images, Google Videos, Discover, Assistant). If no parseable [number] is given, this directive is ignored.

notranslate

notranslate: This page’s translation should not be included in search results. If you don’t include this directive, Google may display a link next to the result to guide people to your page’s translated content.

noimageindex

noimageindex: This page should not be indexed with pictures. If this value is not specified, pictures on the page may be indexed and shown in search results.

unavailable_after: [date/time]

After the given date/time, don’t show this page in search results. The date and time must be given in a generally accepted format, such as RFC 822, RFC 850, or ISO 8601. If no correct date/time is given, the directive is disregarded. Content does not have an expiration date by default. This page may appear in search results forever if you don’t provide this directive.

Googlebot will decrease the crawl rate of the URL considerably after the specified date and time.

For example:

<meta name=”robots” content=”unavailable_after: 2020-09-21″>

It is worth noting that malicious crawlers are likely to ignore meta tags entirely. However, it is not necessary to utilise both meta robots and the x-robots-tag on the same page. All major search engines encourage the appropriate use of meta tags, and if you write helpful, descriptive tags, it’s doubtful that any major search engine would penalise you for doing so. Morevover, just because a search engine “uses” meta-description tags, for example, doesn’t guarantee it is considered as a positive ranking signal in the search results.

Schemas Aren’t Solely for Tech Pros: Myth Busted Schema Is Only Useful For Unstructured Data Schemas’ Indirect Impact on Ranking Schemas Ensure High Rankings: Myth & Facts List Of Schems That Not Supported By Google Anymore?
Schemas Aren’t Solely for Tech Pros: Myth Busted Schema Is Only Useful For Unstructured Data Schemas’ Indirect Impact on Ranking Schemas Ensure High Rankings: Myth & Facts List Of Schems That Not Supported By Google Anymore?