Website Testing on Steroids
View Markdown Other ArticlesArticle written by a human: Mike Cardwell
In a recent post, here I spoke briefly about how I use browser tests to test this website. This has come a long way since I wrote that post so I thought I would detail exactly what I'm doing and why.
I wanted to be able to confidently change CSS and JavaScript and templates and code on this website without breaking anything. I wanted to be able to detect regressions due to code changes, or just the simple passage of time (e.g broken external links).
I wanted a simple Makefile where I could just run make test-$TYPE and it
would run a particular set of tests. And some other make commands to run those
which I want to run all the time after every change, vs occasionally after
relevant changes. Here's what I've come up with.
Unit Tests #
make test-unit
I've written a custom framework in Go which does all sorts of useful stuff related to template rendering, dynamically generating security headers on a per response basis, converting blog posts written in Markdown to HTML pages, searching the website, inlining CSS and JavaScript where it makes sense to, injecting HTTP 103 Early Hints into HTTP responses, hashing assets. I have a whole host of functions that can be isolated and tested individually and quickly.
Integration Tests #
make test-integration
These go one step further and actually build and start the HTTP server, and communicate with it via HTTP. Here I can test that the correct 103 Early Hints response is actually sent, that missing pages actually return a 404 status code, the correct security and caching headers are sent, the correct 301 redirects happen, rate limiting is working, etc. These tests also fetch the sitemap.xml, RSS and Atom feeds, and verify all of the headers are correct and content is valid.
HTML and CSS Validation #
make test-integration (part of the integration tests)
This is where I start going a bit further than is common I think. The integration tests also visit every single page on the website and run the rendered HTML and CSS content through vnu (the Nu HTML Checker). This is the W3C's own validation tool and also powers validator.w3.org. This lets me know if any of the HTML or CSS of any of my pages has mistakes. This is not just useful because I want to produce valid HTML and CSS because I care about that, it's also useful because it captures simple errors like typos in CSS and HTML tags, and other mistakes that might render OK in some browsers and not others.
It occasionally reports false positives for very new CSS features that vnu hasn't caught up with yet, but those can be filtered out by pattern.
Browser Tests #
make test-browser
I use Playwright to visit every page with Firefox, Webkit and Chrome. The tests fail when there is any error/warning in the console, e.g a Content Security Policy violation, or a hash mismatch for inline CSS/JavaScript, or a JavaScript error, or a failed network request, or duplicate network requests.
These tests don't just load a page and look at the errors. They also interact with the page to test component functionality, such as whether or not modals open when the correct element is clicked, or a keyboard shortcut is pressed.
Accessibility Tests #
make test-accessibility
I also have accessibility tests which run in the headless browsers. They visit each page on the website, and inject an accessibility testing engine called axe-core into each page, then running a full WCAG 2.1 AA audit. This has so far detected places where I was missing aria attributes, image alt tags, and even places where the contrast between font colour and background colour was not high enough.
Visual Regression Tests #
make test-visual
I want to be able to make CSS and JavaScript changes without worrying about potential side effects. We visit various pages in real browsers and take screenshots. I maintain a "golden" image directory with these screenshots. If I run the visual tests, the screenshots it produces are compared against the golden images, and if any pixel is different between the two, I am alerted via a failed test. I can then compare the before and after screenshots, and if the change in visible output is expected, I just overwrite the old golden image, with the new one. If the change is not expected, I then fix the problem.
I run these tests in various viewport sizes, as the site is responsive, and Firefox, Webkit and Chrome in case the rendering error is browser specific. Screenshots are generated for all combinations.
One of the interesting things I learnt whilst doing this is that the view between different screenshots of the same page, may not be pixel perfect. Normal browser rendering involves anti-aliasing, font hinting, caret blinking, and animation timing that can all vary between runs, so certain features need to be disabled in the browser to get the same results.
Of course, some of my pages change over time anyway. Like on my front page it
says stuff like Latest Blog Post: Blog Post Title, 1 week ago. That blog
post title will change as I write new blog posts, so the test will fail.
That's OK, I can just create a new golden image each time. However, the
1 week ago text will change just based on the passage of time and nothing
else, which is annoying. To deal with that, I can inject JavaScript for the
page in question, from the test, to delete or modify that text to something
which persists across test runs, before taking the screenshot.
Some test cases include pre-screenshot interactions, like opening the mobile navigation menu or triggering a modal, to verify that interactive states render correctly too.
Spelling and Grammar #
make test-spelling, make test-grammar
I run two tools across all blog posts and static pages to check for spelling and grammar errors. I have only just started doing this, so there are a lot of things for me to go and manually fix still.
For spell checking I use GNU Aspell. I extract the text
parts from the page, pipe it through aspell list and then filter out
various misspellings based on capitalisation. I.e if it's a word that contains
a capital letter in the middle or multiple capital letters, it's likely to be
an abbreviation or name of a thing like "CardDAV" or "NATS" or "CentOS". I also
maintain an allow list.
For grammar/prose checking, I use Vale. It understands Markdown already, so skips code blocks, inline code etc automatically. I use proselint and write-good style packages, with some noisy rules filtered out.
Link Checking #
make test-links-broken, make test-links-upgrade
Here, we check all external links across all blog posts for link rot. After
~17 years of blogging, lots of websites have been redesigned, businesses have
closed, technology has been superseded. These tests extract every URL from
both the raw markdown and the rendered HTML (stripping out <code> and
<pre> blocks to avoid false positives from example URLs), then makes HTTP
HEAD requests to see if they still exist, falling back to GET on failure.
There's a separate test that finds HTTP URLs which could be upgraded to HTTPS.
It takes the http:// URL, rewrites it to https://, and checks if the HTTPS
version works. This has helped me upgrade quite a lot of links. HTTPS is a lot
more common today than it was 17 years ago. Anyone remember
Firesheep?
I've fixed/changed/upgraded a lot of links thanks to these tests, but there are still a load left. I need to manually go through them at some point to find alternative links, or to somehow mark them visually as broken.
Dependency Vulnerability Scanning #
make test-vuln
As I've said before, this website has a Go backend. There is a tool
govulncheck which can
check your dependencies against the
Go vulnerability database. What's nice about this tool
is that it only reports on vulnerabilities in functions your code actually
calls. A make test-vuln triggers this for me
Static Analysis #
make test-staticcheck
There is a tool, staticcheck which catches bugs that the compiler misses. Things like unused function results, deprecated API usage, incorrect format strings, and many others.
Performance Testing #
make test-perf
I want to know if I introduce something to the website which slows it down unexpectedly. For this I measure Web Vitals using Chromium. Before each page load, I inject a script that sets up PerformanceObserver listeners for Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and First Contentful Paint (FCP). I also monitor Time to First Byte (TTFB) using the Navigation Timing API.
I open each page that is tested three times in a fresh browser and take the median time. I then compare the value against a baseline, which I calculated during the first run. I allow a tolerance for each metric. If it lands outside that tolerance, then the test fails. I even want the tests to fail if the performance improves so I can update the baseline figures.
This works because I run the tests from my own hardware so they're fairly consistent. If I was running from a Gitlab worker or something, the performance might vary too much.
Test Coverage #
make test-coverage
Go has built in support for test coverage. I use it to get a summary of which code is tested poorly so I know where to look next.
Summary #
That's a lot of testing. I now feel much more confident making changes to the website. I've learnt a lot setting this up and will likely take that to other projects. If you think there's something else I could test for on top of this, please fire off an email as I'm interested to learn. I hope you got some useful ideas from reading this post.