Top-notch Web Design

See Our Most Recent Web Design Work

Lease A Website

Save With Innovative Website Lease Deals!

Contact Us

581 Boylston St. Suite 604
Boston, MA 02116
Tel: (617) 535-7653

How to write correct robots.txt

Robots.txt is a text file containing instructions for search engines with the aim of site indexing. It allows you to disable the indexing of pages or entire sections of the site, to indicate the correct mirror of the domain, setting the way to the sitemap. This is far not the whole list of options of robots.txt file, but only those that are most often used.

Before indexing the site, search engines examine robots.txt. Therefore the inclusion of this file is highly important, let alone the importance of its content. Please note that all the commands contained in robots.txt file are not required for operation.

Robots.txt is a plain text file which is created using any text editor and is placed in the root directory of the site. It is important that the file name is exactly in the lower register, that is to say, the name Robots.txt or ROBOTS.TXT is incorrect.

While creating a robots.txt file take the syntax of its writing into a thorough consideration. The standard for writing guidelines to follow the robot was developed early in 1994 and since then it has not changed. Therefore, most bots contain commands which are out of standard.

Basic commands for robots.txt:

1. User-Agent specifies the name of the robot, to which the set of commands refers to. If the instruction set is designed for all bots, then mention asterisk instead of the name (User-agent: *).

2. Disallow and Allow denies or allows access to certain pages on the site. Note that Allow option is out of standard, and despite this it is supported by Google.

3. Host directive is supported only by SE showing which mirror of the site should be prioritized. Under the term mirror site domains with the prefix www and without it is meant. The domain should be set without the abbreviation of protocol http:// excluding the closing slash. This directive must be specified after all the Disallow commands.

4. Sitemap reports on the availability of the sitemap and shows the way to it. This directive is cross-sectional.

Rules for writing instructions for the file robots.txt:

• Each line with Disallow instructions must include only one file or directory

• The file name must be in the register at the bottom

• User-agent line cannot be left empty. To access all the robots, use an asterisk (*)

• No substitution symbols, like Disallow: file *. html., should be used in Disallow directive

• Comments on the instructions should be written in a single line

• Disallow instruction is considered to be mandatory. So, if there is nothing to be disallowed, just leave this field empty

• While banning the indexing of directories be sure to use slashes (Disallow: / folder /)

• Blank lines are used only to separate sections

To avoid content duplication on WordPress blogs, the following lines may be closed for indexing

• Tag pages: Disallow: / tag /

• Archive pages: Disallow: / archives /

• Category pages: Disallow: / category /

It is up to you to close these pages or not. After all, each of these pages may be operated based on a key request. And if you sell links on stock exchanges, such pages can generate a good income.

Comments



No comments yet.



Add a comment


* mandatory fields