Friday, July 3, 2009

How can use good robots.txt for Search engines

Using robots.txt, you can ban specific robots, ban all robots, or block robot access to specific pages or areas of your site. If you are not sure what to type, look at the bottom of this page for examples.

An example of SEO optimized robots.txt file (should work on most blogs… just edit the sitemap URL):

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/

User-Agent: Mediapartners-Google
Allow: /

User-Agent: Adsbot-Google
Allow: /

User-Agent: Googlebot-Image
Allow: /

User-Agent: Googlebot-Mobile
Allow: /


When robots (like the Googlebot) crawl your site, they begin by requesting

Robots.txt Samples

Following are a few simple examples of what you might type in your robots.txt file. For more examples, read the robots.txt specification. (In the specification, look for the “What to put into the robots.txt file” heading.) Please note the following points:

Important: Search engines look only in top-level domains for robots.txt files. So this plugin will only help you if typing in http://blog.example.com/ or http://example.com brings up Wordpress. If you have to type http://example.com/blog/ to bring up Wordpress (i.e. it is in a subdirectory, not in a subdomain or at the domain root), this plugin will not do you any good. Search engines look do not look for robots.txt files in subdirectories, only in root domains and subdomains.

Following are a few examples of what you can type in a robots.txt file.

Ban all robots

User-agent: *
Disallow: /

Allow all robots

To allow any robot to access your entire site, you can simply leave the robots.txt file blank, or you could use this:

User-agent: *
Disallow:

Ban specific robots

To ban specific robots, use the robot’s name. Look at the list of robot names to find the correct name. For example, Google is Googlebot and Microsoft search is MSNBot. To ban only Google:

User-agent: Googlebot
Disallow: /

Allow specific robots

As in the previous example, use the robot’s correct name. To allow only Google, use all four lines:

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

Ban robots from part of your site

To ban all robots from the page “Archives” and its subpages, located at http://yourblog.example.com/archives/,

User-agent: *
Disallow: /archives/


robots, robots.txt