Broken links test
From SiteRay wiki
| SiteRay Test | |
|---|---|
| First appeared: | Sitescore 1.0 |
| Applies to: | All versions |
| Type: | Individual test |
| Scored: | Always |
What does it do?
Check for links to web addresses that don't exist, or which return an error. These are known as broken links.
Why is it important?
Broken links are very common and happen to almost any website (usually because one party changes or removes a page, which another party has linked to, but doesn’t know). Testing these regularly with an automated tool is the quickest and easiest way to avoid them.
Example results
Note that in this example web addresses have been concealed.
How is it measured?
Conventional HTML links and Meta Refresh links are checked by this test.
Each link within the website is tested to see whether it returns a valid response. There are four potentially 'bad' responses:
- Page not found (ungraceful) - when asked for the page, the webserver simply replied saying 'that page was not found' (a HTTP 404 response). No HTML was sent by the server, so the error will be displayed by the user's web browser, usually a relatively poor user-experience.
- Page not found (graceful) - when asked for the page, the webserver replied with a full HTML page, but marked it as 'not found' (a HTTP 404 response). This is the best way to handle broken links, as the user will have seen something - ideally a professionally made and useful error message, explaining the problem.
- Host not found - the website itself was not found to exist (the hostname was not found). For example, a link to www.this-domain-does-not-even-exist.com wound fail in this manner. Such errors are always handled by the user's web browser, which cannot be avoided.
- Broken 404 header - when asked for the page, the webserver simply replied saying 'that page was not found' (a HTTP 404 response), however when the page was downloaded it did exist. Technically it replied with a 404 to a HEAD request, but with a valid response to a GET request. This usually means the code behind the website doesn't handle HEAD requests properly: poor practice but not disastrous.
The score is calculated as follows:
score = 10-((brokenLinkPercentage*2)+(definiteBrokenLinkPercentage*7)
+(headerOnlyBrokenPercentage))
if (definiteBrokenLinkCount>0 AND score>7)
score = 7
if (brokenLinkCount>0 AND score>8)
score = 8
if (headerOnlyBrokenLinkCount>0 AND score>9.9)
score = 9.9
brokenLinkPercentage covers links which are broken in any way, except for those which return Broken 404 header (and are, in the vast majority of cases not broken links, but broken HEAD response handling).
definiteBrokenLinkPercentage covers links which return a pure 404 response, or no host. These are indisputably broken links.
headerOnlyBrokenPercentage covers links which return Broken 404 header, as explained above.
Note that therefore a very low percentage of broken links - say 1% - can result in a terrible score for this test. This is by design, as broken links are damaging in small doses and typically a very small percentage of overall link volume.
The limiters on maximum score exist because some very large sites exhibit a small percentage of broken links which are nevertheless significant. Awarding 9.99 (rounded up to 10) for a site with some minor flaws is less appropriate than awarding 9.9.
Technical explanation
- SiteRay attempts a quick HTTP HEAD request for each page, and if this returns a success code ignores the page.
- If the hostname is not found, that error is flagged.
- If the HEAD request returned a 404, a second check is made with a GET request. Depending on the outcome and HTML returned, an appropriate error is flagged.
Common problems
SiteRay says I have a broken link, but it works fine for me
Possible explanations:
- The link was broken when SiteRay tested it (e.g. if the website was down).
- The webpage may be returning an error code known as a "404", which effectively says "Page not found", even if it looks like a valid webpage. This is a technical problem with that website which would negatively impact SEO and should be fixed.
- The link is to a page which you can access, but SiteRay cannot. E.g. the page may only be available on your corporate network.
SiteRay did not find a broken link
Possible explanations:
- SiteRay may not have tested the page containing the broken link. Click on the xxx pages were tested link at the top left of your report, then Advanced options. Try searching for the URL containing the broken link to confirm. If it wasn't tested, your spidering settings may be wrong. See What to do if your website won't test.
- The link is to an external website, and testing of external broken links is not enabled. To check, view the website settings, click Configuration, and check the Use default broken link settings? Make sure that Test external links is checked. Note that testing external websites will slow down this test considerably for some websites.
How to improve this score
Review the list of broken links and fix them, either by removing the link, or pointing it to the correct address.
How to use this test effectively
This test should be run regularly and used as a key quality control mechanism.
