The Robots.txt is a text file that webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website. This is where you can grant or deny permission to all or specific search engine robots to access certain pages or your site as a whole. It is often found in the site’s root directory and it can also be used to reference the sitemap.xml file.
There are several best practices that should be covered when managing a Robots.txt file:
- As a general rule, the robots.txt file should never be used to handle duplicate content.
- Disallow statements within the robots.txt file are hard directives, not hints, and should be thought of as such, although whether they’re observed is contingent on bots having accessed the file prior to crawling the domain.
- No equity (aka page level authority) will be passed through URLs blocked by robots.txt. Keep this in mind when dealing with duplicate content.
- Using robots.txt to disallow URLs will not necessarily prevent them from being displayed in Google’s search engine, because it’s a directive.
- References the XML sitemap