This(robots.txt) is a text (not html) file placed in the root of your site to tell search robots which pages should and should not be visited/indexed. It is not mandatory for search engines to adhere to the instructions found in the robots.txt but generally search engines obey what they are asked not to do.
Web site owners use the /robots.txt file to give instructions about their site to web robots. This is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds your robot.txt file.
Creating a Robots.txt
1. Launch Notepad
2. Put the following in your robots.txt file:
User-agent: *
Disallow: /
3. Save the file as: robots.txt
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Adding the robots.txt file to the root of your public anonymous SharePoint site.
The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it. They do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://www.sitename.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, don't be surprised that search engines index your whole site.
To do this, you can simply follow my article Add a file to the root of a SharePoint site using PowerShell
Ensure the file is accessible to search engines
To ensure the file is accessible to search engines go to your site URL and append "/robots.txt".
Example: http://www.sitename.com/robots.txt
Also you can use Robots.txt Checker to do this.
There are two important considerations when using /robots.txt:
- robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.