Robots.txt is an optional web server file, used to prevent search engine robots and other web crawlers from accessing all or part of a website.
The Robot Exclusion Standard is not enforceable; however, most reputable crawlers will respect the directives in a robots.txt file.
Google will not crawl or index pages blocked by robots.txt, but might still index the content if it is linked to from elsewhere on the Web. A better method of preventing a web page from being listed in Google's index is the noindex meta robots tag.
A good argument for using a robots.txt file is to save server bandwidth. Disallowing robots access to web pages or images which don't need to be indexed will save bandwidth every time a website is crawled.
In fact, the presence alone of a robots.txt file can save bandwidth: Since most web crawlers try to fetch the robots.txt file before attempting to access a website, a missing robots.txt file will trigger a 404 error. In the event that a website has custom error pages, continual 404's will use up additional bandwidth for nothing.
To create a robots.txt file, rules are added to a simple text file which should be named robots.txt, and the file added to the root directory of a domain or subdomain.
Disallow all web spiders for the entire site:
User-agent: * Disallow: /
Allow all web spiders for the entire site:
User-agent: * Disallow:
Disallow all web spiders for the
User-agent: * Disallow: /images/ Disallow: /cgi-bin/ Disallow: /tmp/
Disallow Googlebot for the
about directory, with the exception of one particular file:
User-agent: Googlebot Disallow: /about/ Allow: /about/staff.html