Spider
From SiteRay wiki
A spider (also known as a crawler) is a program that browses webpages automatically. Search engines such as Google use spidering to discover webpages it can search, and SiteRay spiders sites to test them.
A website is said to be spiderable if it can be spidered successfully, i.e. if the pages in the website can be discovered through automated means.
Contents |
How a spider works
The basic principle is simple:
- Download our first page
- Find any links on that page
- Repeat for each new link we find
For the majority of spiders, step 2 consists of finding HTML links on a webpage, and any meta-refresh tags. As a result, any links in technologies such as Flash, JavaScript or Java are not found, and the spider will fail to notice them. Similarly content that can only be accessed behind a Form is typically unspiderable.
Why it matters
SEO
Pages which are non-spiderable typically won't be discovered by search engines and as a result won't positively influence search engine rankings. One of the most basic requirements of SEO is that as much content as possible should be visible - and hence spiderable - to search engines. A site that cannot be spidered at all is effectively invisible to search engines.
See SEO.
Accessibility
Pages which can't be spidered usually depend on technologies which are fundamentally inaccessible, or not guaranteed to be accessible. As a result, a non-spiderable site often suffers from accessibility problems.
See Accessibility.
SiteRay spider
The SiteRay spider is relatively advanced and specialised for testing websites more aggressively than spiders used by search engines. In particular, it can spider aspects of a site considered traditionally non-spiderable, such as Flash and JavaScript.
See SiteRay spider.
In SiteRay
Spiderability is tested by SiteRay via the Spiderability test.
