The robots.txt file is very handy for tuning up how and if search engines visit your site. The Official Google Webmaster Central Blog has a handy chart that shows just what you can and can’t do to keep the search engines out, in their post Improving on Robots Exclusion Protocol:
… there are some cases in which publishers need to communicate more information to search engines — like the fact that they don’t want certain content to appear in search results. And for that they use something called the Robots Exclusion Protocol (REP), which lets publishers control how search engines access their site: whether it’s controlling the visibility of their content across their site (via robots.txt) or down to a much more granular level for individual pages (via META tags). …
The following list are all the major REP features currently implemented by Google, Microsoft, and Yahoo!. With each feature, you’ll see what it does and how you should communicate it.