Software Engineer

My photo
Colombo, Sri Lanka
B.Sc.Special(Hons)in IT, MCTS, MCPD, MCP

Monday, May 27, 2013

Add a robots.txt file to SharePoint 2010

What is robots.txt?

This(robots.txt) is a text (not html) file placed in the root of your site to tell search robots which pages should and should not be visited/indexed. It is not mandatory for search engines to adhere to the instructions found in the robots.txt but generally search engines obey what they are asked not to do.

Web site owners use the /robots.txt file to give instructions about their site to web robots. This is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds your robot.txt file.

Creating a Robots.txt

    1. Launch Notepad
    2. Put the following in your robots.txt file:
          

      User-agent: *
      Disallow: /
   
    3. Save the file as: robots.txt 

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site. 

Adding the robots.txt file to the root of your public anonymous SharePoint site.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it. They do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://www.sitename.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, don't be surprised that search engines index your whole site.

To do this, you can simply follow my article Add a file to the root of a SharePoint site using PowerShell

Ensure the file is accessible to search engines
  
To ensure the file is accessible to search engines go to your site URL and append "/robots.txt".
      
      Example: http://www.sitename.com/robots.txt

Also you can use Robots.txt Checker to do this.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information.

2 comments:

  1. This is really good article. Thank You.

    ReplyDelete
  2. A Las Vegas casino is making legal bets on sports and games
    In a 진주 출장안마 major deal that will 경상남도 출장안마 see Wynn Resorts' 당진 출장안마 new sportsbook, the WynnBET, that is where Wynn 안동 출장샵 Resorts is, and its 광주광역 출장안마 brand new sportsbook, Wynn

    ReplyDelete