Tools for checking your website

I've already mentioned checking your website using a different browser. There are other ways of checking it.

Link checker

This checks all the links in your website, both internal and external. No doubt they all worked when you set them up, but things change. The page you're linking to may have been moved or may no longer exists at all. You may have deleted one of your own pages and forgotten that there was still a link to it from your site map. So go to home.snafu.de/tilman/xenulink.html and download a free copy of Xenu Link Checker. Open the zip file and run setup.exe. You can accept the offered location for the program or choose your own. Now run the program. Click File |Check URL, enter the address of your website and press Enter. Eventually it will say “Link sleuth finished. Do you want a report?” and when you press Enter it will display in your browser a web page showing you all the broken links and the pages which contain them. Occasionally it will have trouble accessing a link and you find that when you click on it there's no problem, or maybe it takes too long for the server to respond. But in my experience, if it says the page doesn't exist it's never wrong. Near the bottom of the page is HTML which you could grab for a Site Map if you want one. Below this is a list of broken links within your pages — the Anchor business I've already mentioned in the section on Links. Very useful stuff — don't hesitate!

HTML Validator

There are many of these, free and otherwise. I used to use the web-based validator at htmlhelp.org/tools/validator but the site hasn't been updated for many years and the validator no longer works.

So I now use the HTML Validator at validator.w3.org but it will only validate the specified page, not an entire site. Having spent a long time cleaning things up using the first validator I mentioned, I then tried one of my pages on this one and got,

No Character Encoding Found! Falling back to UTF-8.
The document located at https://colinhume.com/ american.htm was checked and found to be tentatively valid XHTML 1.0 Transitional. This means that with the use of some fallback or override mechanism, we successfully performed a formal validation using an SGML or XML Parser. In other words, the document would validate as XHTML 1.0 Transitional if you changed the markup to match the changes we have performed automatically, but it will not be valid until you make these changes.

What does this mean? Well, in my page index.htm (which did validate successfully) I have a line in the Head section: <Meta Charset="utf-8">but I didn't have this in any other page — WebEdit now automatically adds it to all of them. The “u” in “utf-8” stands for “universal”, so that's the character set I recommend — if you're writing a Chinese website seek further information elsewhere.

CSS Validator

I used to use two. The first was from the same out-of-date site:

htmlhelp.org/tools/csscheck

but I'm now using lots of things that it doesn't understand. However some messages are still relevant. I got a lot of output: mainly warnings but a couple of errors which I then corrected. Here are a few of the warnings.

Warning: The shorthand background property is more widely supported than background-color.
Warning: To help avoid conflicts with user style sheets, background and color properties should be specified together.
Warning: To help avoid conflicts with user style sheets, background-image should be specified whenever background-color is used. In most cases, background-image: none is suitable.

background is a shorthand property which allows you to give background-color, background-image, background-repeat, background-attachment and background-position in one go. You don't need to give all the values in the list, and they don't need to be in this sequence. So to avoid the first warning, instead of background-color: red you can use background: red.

The second warning points out that users can have their own style-sheets, and if this specifies color: red he won't see anything! I was taking it for granted that body text would be black, but I'd do better to say so. The third warning is more far-fetched — would a user really add a background image to everything? And if he really wants to, should I stop him? But since I'm now using the shorthand, I might as well say background: red none and have done with it.

The second CSS Validator is at:

jigsaw.w3.org/css-validator/validator.html W3.Org is the World Wide Web Consortium, which is the group which maintains web standards — so if anyone is in charge of the web, they are. One of their directors is Tim Berners-Lee, inventor of the World Wide Web. So their validator should be right, though it's interesting that each validator gives warnings that the other doesn't. In fact you can run the same validator as for HTML and it will automatically call this one if your file is CSS.

WebEdit will call these validators automatically for a single file or the whole site using Web|On-line Validate one and Web|On-line Validate all, and will abstract the information you need, so you don't have to scroll through lots of “Congratulations, no errors!” messages. Sorry, this no longer works.

Firefox

The Firefox browser has lots of free add-ins. One of the best was called Firebug — it's now officially part of Firefox and you press F12 to activate it. It's amazing! In particular, if you position to an element in your HTML it will tell you which CSS elements are affecting it, and it will put a horizontal line through items which have been overridden. So you can see that a particular element has picked up color: white from one element and ignored ~~color: black~~ from a more general element. This could save you a lot of time! Other browsers now have similar tools, but I still prefer Firefox.

Log file analyser

If you are on a server that you pay for, you may well have log files generated automatically, and it's worth looking at them.

I found I had a folder called “logs” on my server. This had no files in it, but another folder with a random-looking name, presumably so that no-one else could guess where to find my log files. This contained many files with names like ex120229.log, which are log files produced to tell me which parts of my website have been accessed on that date (29th February 2012). After a time the log files are zipped to save space, so they will have names like ex120229.zip, but the program I'm recommending copes with those too. This is how you can analyse the data.

Find your log files using the Server menu, and allow WebEdit to create the necessary folders. Right-click in the main area and click Select All. Now right-click on any of the selected log files and click Download. Eventually all the log files will be downloaded to your local drive.

There used to be WebLog Expert Lite which was freeware, but no longer. Instead you can try out the full version for a month. Go to weblogexpert.com and download, install and run it. (Once you're using it successfully you can delete the Sample profile and also the file sample.log in the program directory.) Click New and give the requested information. On the next screen, give the path to your log files on the local drive — for instance C:\External\Colin\Logs\W3SVC1198\*.* — there's a “Browse” button so you don't have to remember and type in the path.

Click Analyze and in a few seconds you will see lots of interesting information! For instance, the Referrers item of the Contents shows you how people got to your website and what they were looking for. I found that the most popular search engine (by a factor of 200) was Google, which I would have predicted. I discovered that by far the highest search phrase was “waltz steps”, which I certainly would not have predicted; “dance technique” came second, and most of the others were to do with waltz. So maybe I should devote more attention to this page. Maybe I should produce a downloadable video on waltzing and charge people for it. Whatever reason you had for creating your website, this is where you can see why people really visit it. Sorry, this no longer works. When I tried it in 2020 I found that Google and Bing had stopped passing the keyword information — to protect the searcher's security, they say. So you can still see what pages they go to, but not what they're looking for.

All of the sections are worth looking through. For instance, look at Browsers and decide which you can ignore and which you need to concentrate on.

And what about the final “Not found” section? In the space of 21 days I had 6,475 occurrences of “Code 404: Not found” — the others were 57 or less. What's going on? What were people looking for and not finding? The program doesn't tell you this — but now that I've realised I need to, I can search all the log files for “404” using WebEdit. What I found was that most of the lines containing “404” were of the form

“2006-09-28 02:35:49 10.2.5.20 GET /robots.txt — 80 — 219.142.118.81 — — 404 0 64”

robots.txt is a file telling browsers which pages you do not want indexed. I checked the official documentation and found

The presence of an empty “/robots.txt” file has no explicit associated semantics; it will be treated as if it was not present, i.e. all robots will consider themselves welcome.

which was what I was hoping for. So I created a robots.txt file in my root directory containing a single blank line. In future I'll be able to find the 404 errors that really matter.

Later I discovered that some browsers look for a file called favicon.ico in the root directory. I do have one of these files, but I'd tidily moved it to my images directory. I moved it back! This is the icon which is used if someone puts a shortcut to one of your website pages on their desktop, and may appear in the address bar for the page and in bookmarks. Read about it at en.wikipedia.org/wiki/Favicon.

When you've finished studying the logs, you can reclaim the space on the server and your local drive. Back in the Server display of the log files, right-click the main page, click Select All, right-click any selected file and click Delete. The log files will be deleted — apart from the one currently in use by the server.

Google Analytics

If you don't have log files automatically generated (or even if you do) there are web sites which will do a similar job. Read the Wikipedia article on Google Analytics and decide whether you want to use it. If so, go to google.com/analytics and create an account. You'll then be shown the code which you need to paste into each of your web pages (or all those you want to keep track of) just before the end of the Body. Don't immediately check that it has done anything — it will say the code hasn't been found. Wait a day or so and then check — login and click on the appropriate “View report”.

You can also use the Google Webmaster Tools at google.com/webmasters/tools. I think to use this you need to upload a sitemap to Google — there's an option in WebEdit to scan your sitemap.htm and generate sitemap.xml which is the form Google wants the information in. You can then see, for instance, the two websites which refer to a page you have deleted, and you can email the owners of these sites asking them to change or remove the failing link.

Another site is statcounter.com and there are many more, but I suspect Google is the best (as usual).

Website Grabber

If you want to see how somebody else's website works (or if you've been asked to take over a website and have no idea what it consists of), a very useful free tool is HTtrack Website Copier fromhttrack.com. This will download the entire site to your hard disk so that you can search for particular items or whatever.

Website Grader

If you're hoping to make money from your site, or if you're simply hoping that people will find it, you might find it worthwhile running the free Website Grader at marketing.grader.com. It will give you lots of hints about Search Engine Optimisation, and will compare your site with any rival sites you specify.

Image compressors

Large images slow down your page loading as well as taking up more space on your server. What you want is something that will compress your images without losing any of the quality. Here are two options.

There's an on-line image compressor for JPEGs and PNGs at websiteplanet.com/webtools/imagecompressor

Or what I use is the File Optimizer from nikkhokkho.sourceforge.io/ static.php?page=FileOptimizer which you install on your machine. This runs a number of optimisation routines on the files you give it. It will also integrate into WebEdit — see the pop-up menu on the Server screen.