Crawler directives are instructions website owners use to control how search engine crawlers access and index their site content. These directives, implemented through robots.txt files, meta tags, and HTTP headers, guide crawlers on which pages to crawl, index, or follow, helping manage crawl budget and protect sensitive content.
Robots.txt Controls Site-Wide Access
This text file tells crawlers which sections of your site they can access. It's the first place search engines check when visiting your site.
Meta Robots Tags Provide Page-Level Instructions
These HTML tags give specific directives for individual pages, controlling indexing and link following. They override robots.txt for more granular control.
X-Robots-Tag Works for Non-HTML Files
This HTTP header directive controls crawling for PDFs, images, and other file types that can't use meta tags.
Noindex Prevents Pages from Appearing in Search
This directive tells search engines not to include a page in their index. The page can still be crawled but won't show in search results.
Nofollow Stops Link Equity Transfer
This instruction tells crawlers not to follow links on a page or not to pass authority through specific links, useful for user-generated content or paid links.
Crawl Budget Management Requires Strategic Implementation
Proper use of these directives helps search engines focus on your most important pages, preventing wasted resources on duplicate or low-value content.
How do robots.txt and meta robots tags differ?
Robots.txt blocks crawling at the site level before crawlers access pages. Meta robots tags control indexing and following after a page is crawled, offering more specific control.
Can I use multiple crawler directives on one page?
Yes, you can combine directives like "noindex, follow" to prevent indexing while still allowing crawlers to follow links. Different directives serve different purposes and work together.
What happens if I block a page in robots.txt and use noindex?
Search engines can't see the noindex tag because robots.txt prevents crawling. This can leave already-indexed pages in search results. Use meta robots tags instead for deindexing.
Should I use crawler directives on all ecommerce filter pages?
Strategic use helps prevent duplicate content issues from faceted navigation. Consider noindex for filter combinations while keeping important category pages crawlable to preserve crawl budget.
Robots.txt
A text file in a website's root directory that instructs search engine crawlers which pages or sections to crawl or avoid. Robots.txt is a critical tool for managing crawl budget and preventing indexation of low-value pages.
Meta Robots Tag
An HTML element that instructs search engines how to crawl and index a specific page. Common directives include noindex (don't index), nofollow (don't follow links), and noarchive (don't cache).
Noindex Tag
A meta robots directive that prevents search engines from including a page in their index. Noindex is used strategically to keep low-value pages — like tag archives or internal search results — out of search results.
Related Glossary Terms
Need help putting these concepts into practice? Digital Commerce Partners builds organic growth systems for ecommerce brands.
Learn how we work