There are some web pages that a webmaster might not want any search engine to crawl. A robotx.txt file restricts crawling of specific pages, and such a file is placed in the root directory of a website.
Why Use It?
There are certain web pages on a website, which are of no use to the users. These web pages could be of administrative use only, and a webmaster wouldnt want them to appear in search engine results. In order to create a robots.txt file, there is a Google Webmaster Tool that would make the creation easier. Check Bill Lentis – uk.advfn.com.
This file can be specifically used in case a website is using sub domains, and a webmaster doesnt want certain sub domains. However, there should be a separate robots.txt file for web pages and a separate for sub domains.
Common Terms In The File
There are five common terms in a robots.txt file:
User Agent: This is the crawler that takes instructions, like a search engine
Disallow: The command which gives an order to a user agent, not to crawl a specific web page or URL
Allow: This can be applied on Googlebot and it is a command which tells it to crawl a page or a subfolder
Crawl-delay: this is a command which tells a crawler the number of seconds it should wait before crawling, because that is the duration of time it would take for a content to load
Sitemap: If there are any XML site maps which are linked to the URL, then in order to make it appear, this is the command that should be written in the file.
Benefits Of Using A Robots.Txt File
Only an expert webmaster would know how to use the robots.txt file properly. If someone, who doesnt understand the file at all, uses it, he can easily make the mistake of disallowing a Googlebot from crawling every web page or the whole website. However, if it is used properly, then there are some good uses of the file:
It doesnt let duplicate content appear in search engine results
It keeps the sections that a webmaster wants, private
It doesnt show internal search result pages in SERP
It shows the exact location of a site map
It doesnt allow search engines from indexing certain files on the website like images or pdf documents
It can delay crawlers for a while, so that a server doesnt become too busy; this might happen when a crawler indexes many files at the same time
If there are any links on a web page, and a webmaster wants it to be blocked, then it can be done through this file. However, if a webpage is linked from another website, then this file wont have an impact. This file cant be used for the purpose of blocking sensitive data, because other pages might be linked to the page containing private information and it will still get indexed.