Sponge Voyage: How to keep robots from your site

THE ROBOTS.TXT FILE

You know that search engines have been created to help people find information quickly on the net, and the search engines obtain a lot of their information through spiders (also called spiders or robots), that look for webpages for them.

The spiders or robots programs explore the web searching for and taking a myriad of data. They usually start with URL posted by people, or from links they find on the internet sites, the sitemap documents or the top level of a site.

Once the robot accesses the home page then recursively accesses all pages connected from that page. But the robot can also check-out all of the pages that can find o-n a specific server. To check up additional information, consider glancing at: peach kelli pop.

It works indexing the concept, the keywords, the written text, etc after-the robot finds a web site. My family friend found out about official website by searching webpages. But sometimes you may wish to avoid se's from indexing a few of your web pages like news posts, and particularly designated web pages (in example: affiliates pages), but whether individual robots conform to these events is genuine voluntary.

ROBOTS EXCLUSION Project

Therefore if you want robots to keep out from some of your web pages, you can ask robots to disregard the web pages that you dont want found, and to do that you can place a robots.txt document on the local root server of your web site.

In case if you've a service called e-books and you want to ask programs to keep out of it, your robots.txt record should read:

User-agent: * Disallow: e-books/

When you dont have sufficient get a handle on over your server to create a report, you can try adding a META tag to the head area of any HTML file.

In example, a label just like the following shows programs not to index and not to follow along with links on a particular page:

meta name='ROBOTS' content='NOINDEX, NOFOLLOW'

Support for the META tag among programs isn't so frequent while the Robots Exclusion Protocol, but nearly all of main internet spiders currently support it.

INFORMATION POSTINGS

If you desire to keep the search engines out of your news postings, you can cause an an 'X-no-archive' line-in of the postings' headers:

X-no-archive: yes

But though common news customers, allow you to add an X-no-archive point to the headers of the news listings, many of them dont enable you to do so.

The problem is that many search engines think that all data they find is public unless marked otherwise.

So be mindful because although software and archive exemption expectations can help keep your material from major search engines there are several others that respect no such rules.

You should use some anonymous remailers and PGP, if you're very concerned with the privacy of one's e-mail and Usenet postings. You are able to learn about it here:

http://www.well.com/user/abacard/remail.html http://www.io.com/~combs/htmls/crypto.html

http://world.std.com/~franl/pgp/

Keep in mind that something you write is likely to be indexed and archived anywhere for eternity, therefore utilize the robots.txt file around you want it, even though you are perhaps not especially worried about privacy. This poetic http://www.peachkellipop.com/pages/about wiki has a myriad of wonderful lessons for the purpose of it.

Compiled by Dr. Roberto A. In the event you wish to discover further on return to site, there are thousands of resources people should think about pursuing. Bonomi.

Sponge Voyage

How to keep robots from your site

No comments:

Post a Comment