robots.txt file is the first file that a search engine requests when indexing your site. This file lets you tell the search engines which pages on your site not to index. When you first set up your site, it is important to have your
robots.txt file in place before you go live. This is especially important if you have faceted navigation. Faceted navigation can result in a large number of URLs to pages that appear to search engines to have the same content. Since duplicate content has a negative impact on your search engine ranking, you should use the
robots.txt file to control what is indexed and prevent the search engine from indexing pages that appear to be the same. For information on creating the
robots.txt file when you use faceted navigation, see Robots.txt with Categories and Facets.
Important: Test the Robots.txt File
Before taking a site live, it is extremely important to test the
robots.txt file to confirm how the different URLs behave. The best tool available to perform this test is the Robots Testing Tool in Google Webmaster Tools.
How to Create the Robots.txt File
robots.txt file is a text file. You can use any text editor to create the file.
Robots.txt Common Commands
The following sample robots.txt files give you some commonly used methods of disallowing/allowing indexing.
Allow all web crawlers to crawl all content:
Block all web crawlers from all content:
Block a specific web crawler from all content:
Block a specific web crawler from a specific facet and all its values:
Block all crawlers from a specific facet disregarding the order in which it appears:
Allow all crawlers to crawl a specific facet value within a facet, disregarding the order in which it appears:
Allow all crawlers to crawl a specific facet value within a facet only when this facet appears first:
Block all web crawlers from adding items to cart by following ‘Add to Cart’ links:
This is applicable only to SiteBuilder sites. Commerce web stores do not have ‘Add to Cart’ links available to web crawlers.
Robots.txt File Location
robots.txt file should reside in the root folder of your website. You can create the
robots.txt file on your local drive and upload it to the file cabinet.
To add the robots.txt file to the file cabinet
Go to Documents > Files > File Cabinet.
In the file cabinet, go to Web Site Hosting files > Live Hosting Files.
Click Add File.
Browse to the location of your
robots.txt file and select it.
Click Open. This adds the file to the file cabinet.