Way To Write robots.txt File

Way To Write robots.txt File

Monday, May 9th, 2011

The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol. There is a common text file named robots.txt that allows search engine robots to crawl web-page content. Search engines don’t like pages with hidden tags or bugs stuffed with keywords or pages with too much of keyword usage in the text. Such pages might end in getting banned by search engines.

Usually, Robots are known by several names, such as :
a. spiders
b. bots
c.crawlers

Different search engines have different robots, few of them are listed below:
Yahoo Slurp
Googlebot
MSNBot
Alexa (IA Archiver)
MSNBot-media
Ask

Though the names of the robots are different but the purpose is same, crawling the content on web-pages. These bots are programmed in such a way that they move systematically through a website and crawls only those directory contents which are allowed in the robots.txt file.
Usually, the robots.txt file is located into the roof directory of the web site.
For example : http://www.yoursitename.com/robots.txt.

Format :
User-agent: robot name
Disallow: files or directories.

If you want to take out all the search engine crawlers from your whole domain then you can utilize this tiny code, but be sure you require it.
User-agent: *
Disallow: /

If you want to allow your certain directories, you can clearly indicate them in Disallow field.

User-agent: *
Disallow: /directoryname1/
Disallow: / directoryname2/

Correspondingly,  if you wish to disallow particular files then type in the path of the files as shoen below.

User-agent: *
Disallow: / directoryname1/filename.html

If you do not need specific bots crawling your site, which are useless for your site or are just eating up your bandwidth, you can disallow them in the robots.txt file.  Such as :  You don’t want Alta vista bot named (Scooter) from crawling your whole web site, utilize following code :
User-Agent: Scooter
Disallow: /

When you utilize robots.txt file you should be very aware, it may stop the mentioned web-pages or directories from executing in search engine result pages. There are so many crawlers on the Internet, most of them will respect your robot.txt file, some may not.

Note :
If you want to add comments in your robots.txt file then , you can put a hash-symbol (#) at the front of the line to be commented.

Article written by
Donna – BodHost.com