|
The importance of robots.txtAlthough the robots.txt file is a very important file if you want to have a good ranking on search engines, many Web sites don't offer this file. If your Web site doesn't have a robots.txt file yet, read on to learn how to
create one. If you already have a robots.txt file, read our tips to make sure
that it doesn't contain errors. What is robots.txt? When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored. The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example: http://www.yourwebsite.com/robots.txt How do I create a robots.txt file? As mentioned above, the robots.txt file is a simple text file. Open a simple text editor to create it. The content of a robots.txt file consists of so-called "records". A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more Disallow lines. Here's an example: User-agent: googlebot This robots.txt file would allow the "googlebot", which is the search engine
spider of Google, to retrieve every page from your site except for files from
the "cgi-bin" directory. All files in the "cgi-bin" directory will be The Disallow command works like a wildcard. If you enter User-agent: googlebot both "/support.html" and "/support/index.html" as well as all other files in the "support" directory would not be indexed by search engines. If you leave the Disallow line blank, you're telling the search engine that all files may be indexed. In any case, you must enter a Disallow line for every User-agent record. If you want to give all search engine spiders the same rights, use the following robots.txt content: User-agent: * Where can I find user agent names? You can find user agent names in your log files by checking for requests to
robots.txt. Most often, all search engine spiders should be given the same
rights. in that case, use "User-agent: *" as mentioned above. Things you should avoid If you don't format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. To avoid this, do the following:
Tips and tricks: 1. How to allow all search engine spiders to index all files Use the following content for your robots.txt file if you want to allow all search engine spiders to index all files of your Web site: User-agent: * 2. How to disallow all spiders to index any file If you don't want search engines to index any file of your Web site, use the following: User-agent: * 3. Where to find more complex examples. If you want to see more complex examples, of robots.txt files, view the robots.txt files of big Web sites: Your Web site should have a proper robots.txt file if you want to have good rankings on search engines. Only if search engines know what to do with your pages, they can give you a good ranking. Copyright Axandra.com |