In early summer, internet marketers celebrated their anniversary: the robots.txt file is 20 years old. In honor of this, Google has expanded the functionality of webmaster tools by adding a robots.txt file verification tool to the suite. Experienced marketers know perfectly well what the file is and how to work with it. Beginners will get basic information from this article.
Do not like to read? Watch the videoEven more useful videos are on our YouTube channel. Subscribe;)
Now go to the text.
Why robots.txt is needed
The robots.txt file contains information that search robots use when scanning a site. In particular, from robots.txt, crawlers will find out which sections of the site, page types or specific pages should not be scanned. Using the file you exclude from the index of search engines content that you do not want to show to search engines. You can also disable indexing of duplicate content.
If you use robots.txt incorrectly, it can cost you dearly. An erroneous ban on scanning will exclude important sections, pages, or even all content from the index. In this case, it is difficult for you to count on successful website promotion.
How to work with robots.txt
The robots.txt text file contains instructions for search engine robots. Usually it is used to prohibit scanning of service sections of the site, duplicate content or publications that are not intended for the entire audience.
If you do not need to close any content from scanning, you can leave the robots.txt blank. In this case, the file entry looks like this:
If for some reason you are going to completely block the site for search robots, the entry in the file will look like this:
To properly use robots.txt, you must have an idea of the directive levels:
- Page level. In this case, the directive looks like this: Disallow: /primerpage.html.
- Folder level At this level, directives are written like this: Disallow: / example-folder /.
- Content Type Level For example, if you do not want robots to index .pdf files, use the following directive: Disallow: /*.pdf.
Remember the most common mistakes encountered when compiling robots.txt:
- Complete ban on site indexing by search engines
In this case, the directive looks like this:
Why create a website if you don’t allow search engines to crawl it? The use of this directive is appropriate at the stage of development or global improvement of the resource.
- The ban on scanning indexed content
For example, a webmaster may prohibit scanning folders with videos and images:
Disallow: / images /
Disallow: / videos /
It is difficult to imagine a situation in which a ban on scanning indexed content would be justified. Typically, such actions deprive the site of traffic.
- Use attribute allow
This action makes no sense. Search engines by default scan all available content. Using the robots.txt file, you can disable scanning, but you do not need to allow indexing anything.
Robots.txt file verification tool
In mid-July, Google introduced a tool to check the robots.txt file, available in the panel for webmasters. To find it, use the "Site Toolbar - Scan - robots.txt file verification tool" menu.
The new tool solves the following tasks:
- Displays the current version of the robots.txt file.
- Editing and checking the correctness of the robots.txt file directly in the panel for webmasters.
- View old file versions.
- Check blocked URLs.
- View error messages for the robots.txt file.
If Google doesn’t index individual pages or entire sections of your site, the new tool will help you check for a few seconds whether this is due to robots.txt file errors. According to Google expert Asaf Arnon, the tool highlights a specific directive that leads to content indexing being blocked.
You can make changes to robots.txt and check its correctness. To do this, simply specify the URL that interests you and click the "Check" button.
Google spokesman John Miller recommends that all site owners check the robots.txt file with a new tool. According to the expert, spending a few seconds to check, the webmaster can identify critical errors that prevent Google crawlers.
To properly use ...
... robots.txt file, you need to understand its practical meaning. This file is used to restrict access to the site for search engines. If you want to prevent robots from scanning the page, section of the site or content type, enter the appropriate directive a robots.txt. Verify that the file is used correctly with the new tool available in the Google webmaster panel. This will help you quickly detect and eliminate errors, as well as make the necessary changes to the robots.txt.