In my recent article on expressive test handling, I touched briefly on a closely related principle – ensuring that the tests themselves are expressive; that is, that the code defining each test communicates effectively and efficiently to the people reading it. I outlined some principles and promised to come back to them in more detail later. Time to make good on that promise…
Why bother making test code expressive?
Tests aren’t deployed, so surely their code doesn’t matter that much? We don’t need to read the test code, right? We just run the tests and hope everything goes green!
Well…no. The tests get read multiple times, and usually with questions in mind:
- The reviewer during code review
- What is this test doing/checking?
- You a week later when you find bugs in UAT!
- I thought this test was going to catch that failure mode?
- The next developer modifying the code being tested
- What cases do I need to add new tests for?
- The next developer modifying one of the less important setup dependencies
- Does this particular input matter? Is that value passed in because you must pass something, or is that the entire point of this test case?
- The developer who discovers they broke these tests when refactoring some core code, miles away
- What is this test even trying to do? Does it matter that I completely changed the usages of this method – is it setup? Or a core part of what’s getting tested?
Unreadable test code is tech debt just as much as unreadable production code.
What makes tests expressive?
Well, that’s easy enough to answer. Expressive tests are written in code that:
- is easy to read
- leads its reader clearly through its process
- tells its reader what’s important and what’s not
It sounds trite, but the code defining a good test will have as much as is needed to express the details relevant to that test case, and nothing more beyond that.
Here are a variety of specific ways to achieve that:
Choose self-explanatory test names
Test method names should be long!
The guidelines that govern appropriate names for production code methods do not apply to test methods. Test method names should generally be much longer than production method names, sometimes with multiple “clauses” being articulated to explain the identifying specifics of what that test is examining.
When combined with the encompassing class name, test method names should tell you what’s interesting in the case being examined by that test.
If you have multiple tests exploring subsets of behaviour then there will probably be a bunch of tests whose name starts the same, but then differs at a later clause. Separating the test names at that fork point with “_” or some other language-appropriate delimiter is a sensible approach to make it easy to spot what’s important in this test, or quickly find the test’s definition from an error output.
Unhelpful names:
- OperatorServiceTests.Test1
- OperatorServiceTests.CheckOperatorServiceWorks
- OperatorServiceTests.OperatorServiceGivesExpectedOutput
Helpful names:
- OperatorServiceTests.ReturnsAllRecordsByDefault
- OperatorServiceTests.ReturnsTargetedRecord_WhenFilteredByExactName
- OperatorServiceTests.ReturnsTargetedRecords_WhenFilteredByWildCard
- OperatorServiceTests.Throws_IfDbRecordHasNoParent
Naturally, as with all naming advice, these examples won’t be absolute classifiers. Occasionally “PrimaryHappyPathThroughOperatorWorfklowThrowsNoErrors” will be the best way to articulate what’s in the test, but always check whether better options are available.
Clearly separate the functional sections of tests
Within a particular test method there should be a clear distinction between the bits of code that are:
- Setting up the scenario
- Actually running the code that’s being tested
- Checking that the code we ran did the right thing.
There are two extremely common patterns or mnemonics for labelling these three functional “sections” of a test:
- “Arrange; Act; Assert” (often AAA)
- “Given; When; Then” (often GWT)
It doesn’t really matter which you use[1]: they both do the same job and it’ll be completely clear what’s meant whichever words you use. In fact, in 99% of cases, you could phrase a test as:
“Given that I’ve Arranged this state, When I Act like this, Then I Assert that I should see this expected state”
invoking both versions simultaneously. But personally, I prefer[2] the AAA form.
GWT is great for articulating behaviours or requirements at a user- or Business Analyst-level with a client, or for writing user stories and epics during sprint planning. Plenty of tests fall nicely into those terms…but not all of them. Where GWT doesn’t quite make sense – and I’d say this is 1-10% of tests – you’ll end up bending the meaning of the labels to “justify” why an AAA structure fits into the GWT phrasing.
Conversely, I’d assert that AAA applies to the structure of every test you write: it’s a slightly better fit for the level of abstraction that most test code (especially integration test code) is written at.
Whichever format you use, what matters is that you clearly signpost the different kinds of code in your tests. I’d strongly argue for every test having header comments on each section, delineating the Arrange code from the Act code, and then the Act code from the Assert code[3].
This becomes especially important if some of the setup uses production code to create the required state. It’s common in integration tests for one of the Arrange steps to be “I used the API to create a booking”, with the Action being “I used the API to cancel the booking”. Without clear labelling it wouldn’t necessarily be clear to the next dev whether the booking creation is part of the point of the test (and therefore mustn’t be changed without considering the implications for coverage), or whether it was simply one particular way to set up the pre-conditions for the tests (and therefore could be changed to “I just created it in the DB directly” if desired).
Repeating functional sections in a single test is fine if you label them clearly
Mostly a test will have a bunch of Arranging, followed by single block of Action, followed by some Assertions. But occasionally it may be sensible for a single test to contain multiple iterations of Acting and Asserting, then Acting again and making further rounds of Assertions.
Common cases for this could be:
- Testing a class that manages some sort of workflow
- A process that is executed regularly and interacts with the state left behind by an earlier execution
In both these cases, the Arrange section for some later tests may be exactly“Do the Arrange and Act sections of the earlier tests”. Here it’s perfectly acceptable to write a single test that performs a sequence of different Actions, stopping in between to make Assertions about the expected state, both to check that the previous Action did what was expected and to check that we’ve successfully Arranged the state for the next Action.
This is fine – but it makes it even more critical that you comment the sections clearly, indicating when you’re switching between Acting and Asserting, or even back to some mid-flow Arranging.
Some would point at the common aphorism that “a test should only do one thing”, to dispute this point. That’s a good rule of thumb, but tests must also be useful, convenient and as fast as reasonable – if that requires doing multiple things in a single test, then so be it. Never stand on principle over value.
Commonise irrelevant setup values to make significant test data obvious
Particularly for integration tests, there’s often a lot of setup required that isn’t specifically relevant to what you’re testing right now; a lot of it is usually “I need one standard customer object, with most of the usual data present”.
Copious setup can make it hard to spot the values that are particularly pertinent to this specific test case, or what’s different in this case versus the previous case – a crucial value can end up buried in a few dozen indiscernible lines. Even if the next dev does spot that one line is different, it’s not necessarily obvious whether it’s:
- the key difference for the test
- intentionally different, but only for the sake of passively exercising more cases – not the main purpose of the test
- a value that got missed during a broader update
- just a coincidence; not thought about at all
You could solve this with comments, but a much better approach is to create a dedicated “set up a standard customer for tests in this class” method, with optional parameters for the notable values. Your test setup code then reads as “I need a customer, and specifically one with this notable property”. Sometimes that setup will be unique to that test class; sometimes the same base setup code will be used repeatedly throughout the suite of tests, which starts to also make test-writing faster overall.
This is something that comes up so often that I and some others at Softwire have written and open-sourced a dedicated library for it: LochNessBuilder. This is a fluent-style library which allows Builder objects to be progressively enhanced so that you can:
- define base builders for a type, which configure standard properties and can be reused throughout the test suite
- take one of the base builders and extend it in a particular test class, to set up the properties that are going to be repeatedly used for this set of tests
- then (per test) take the class’s builder as a baseline and add in the specific values that are pertinent only to that test
The end result is setup code that clearly expresses “for this test we have a general customer, with these specific modifications”, helping the next reader to understand the why of your Arrange section.
Parameterise your inputs and outputs to make different scenarios understandable at a glance
An extrapolation of the previous point: sometimes you’ll find yourself writing a group of tests with extremely similar patterns, where all that’s varying is a small number of the inputs and corresponding expected outputs. This is particularly common when testing an algorithmic section of your codebase.
Possibly it’s as simple as “depending on what two numbers I feed in, I get a different string out”. Or the nature of the input could be more complex, with multiple lists of values, used in a variety of Arrange and Act steps, combined into one final desired output from the test’s last Action.
Whichever extreme you’re at, using parameterised tests can clarify what’s going on to the reader. This structure exists in almost every test framework, hooking into the language’s existing parameterisation behaviours: for example, TestCases
in nUnit, InlineData
in xUnit, or parametrize
in Python. Each one runs the test code repeatedly with the various sets of values. The reader engages with single set of test code to understand its behaviour in the abstract and can then follow through the test cases focussing solely on the interplay between the various inputs and their expected outputs.
For more complex test scenarios, you may want dedicated objects that hold an entire complex set of inputs and expected outputs. That object is then passed to the dedicated test method, which knows how to turn that scenario definition into a sequence of Arrange, Act and Assert steps to exercise that scenario as a test.
Use dedicated helper methods to ensure Assertions are easily readable
Writing readable Assertions has two parts, and the first is super simple: if you have an assertion library which provides pre-built methods articulating complex Assertions in simple English phrases, then use it.
In .NET that’s FluentAssertions, and I discussed it with some examples in my previous article. Libraries like this aid the readability of test code, by condensing complex multiline examinations of values and objects down into single instantly understandable expressions. They also generally improve the readability of the failure outputs too, as discussed in that previous article.
You can extrapolate this concept by writing your own custom Assertion helpers. This takes more effort, but can also provide correspondingly more readability and value. Generally, this is most valuable either when you have complex code calculating an Assertion that’s easy to summarise in words, or when you have the same sets of checks happening repeatedly against a domain object.
Create yourself a method to commonise those checks. This could just a be standalone method in your test class that takes the object(s) and runs the Asserts used in that particular class: AssertMeetingScheduleHasNoOverlaps()
, or it could be a reusable extension to your testing framework, since most of them support that.
As an example, I was recently writing tests to check and document exactly how Timeout behaviour was being handled in my codebase. FluentAssertions already has a myMethod.Should().ThrowException()
Assertion, but I extended its concept to support checking the time that the method took to run or error; a few variations on myMethod.Should().RunInExactly(5.Seconds())
and myMethod.Should().ThrowExceptionAfterAtLeast(2.Seconds())
.
These extension methods weren’t necessary; I could have done the timing explicitly in the test. But that version of the test is just harder to read:
[Test]
public void
SqlTimeoutWorksWithoutScopes_WithExtensionHelper()
{
DbQueryWithTimeouts(null, 1, null, 2)
.ShouldThrowExceptionAfterRoughly<SqlException>(TimeSpan.FromSeconds(1), overheadGracePeriod);
}
[Test]
public void
SqlTimeoutWorksWithoutScopes_WithExplicitTiming()
{
var timer = new Stopwatch();
timer.Start();
DbQueryWithTimeouts(null, 1, null, 2).Should().Throw<TransactionException>();
timer.Stop();
timer.Elapsed.Should().BeCloseTo(TimeSpan.FromSeco
nds (1), overheadGracePeriod);
}
and the effect gets magnified as the tests become more complex and cover more cases.
Ideally helper methods would be clear about what they’re testing, like the previous Meeting example. But even an AssertCustomerStateIsValid()
method is valuable – it makes it clear, when reading any single test, which lines are general “check everything is broadly fine” Assertions, versus which lines are “check that this very specific value that I put in whilst Arranging has had the expected impact on the output”.
Use variables judiciously in test code
The use of variables in test code is complex and honestly relies mainly on experience and judgement.
Using variables and calculations can:
- make it clear how and where the expected outputs are derived from parts of the inputs (see the point above about highlighting what’s relevant)
- ensure that related tests stay in sync when values are changed
- avoid silly typo mistakes
An example of sensible use could be in a situation where you’re running one process threetimes, and another process twice, and then Asserting there should now be 15 records in a particular table.
When the next dev reads that, how do they know where “15” came from? Storing input values in variables (rather than hard-coded) allows you to explicitly show the calculation of the expected output is in fact 3(m+n)
records when the setup processes ran m and n times respectively. This is clearer to the reader and makes future updates easier.
But there can be a fine and complex line between the beneficial version of those calculations…and going too far with it, so that you’re just reimplementing the logic of the production code in the test.
Reimplementing logic in a test is a very bad pattern – all it tests is that you can implement the same logic twice. It’s remarkably easy to end up with the same broken code in both the production class and the test class, and the test then still passes because the two copies of the code agree. Further, when a later dev breaks the test, it’s much harder to understand which of the two different versions of the logic is correct. Is the bug here “we inadvertently changed the production logic and shouldn’t have”? Or is it “we changed the production logic, and now the test logic needs updating”?
This isn’t an area with hard and fast rules; it’s definitely more about judicious application of honed instincts. But certainly, sometimes the right answer is to use variables to provide clarity on how values have evolved in the tests.
Record why the expected output is expected
Most of the time no active effort is required for it to be clear why the expected output is what it is – plenty of tests are completely self-evident about that. Indeed, following the other principles here will do a lot of the work towards explaining what a test is doing.
But occasionally you’ll still be left with code that’s clear about what it’s testing, and how it’s doing so, but still not clear about why the output is expected. If it’s at all non-obvious, then document it!
At a minimum, add a comment to the line explaining it. But better: if your Assertion Library supports it, add the explanation as part of the Assertion itself. This is often available as a because
string parameter, which will be printed as part of the test output if it fails.
An example:
rejectedBookings.Should().BeEmpty(“because users created before that date are exempt from the usual booking limitations.”)
Summary of the principles for writing expressive tests
You can boil down my recommendations for how to write expressive tests into these general principles:
- Use good test names
- Signpost your test methods
- Signpost your test data
- Separate the test implementation details from the test’s intent and different cases, where appropriate
- Use sub-methods and variables for clarity
- Report errors clearly
Eagle-eyed readers may notice a distinct similarity between these principles and some of the principles of CleanCode[4]. It turns out that “test code”…is in fact also code, and thus should also be clean!
1 I won’t always be opinionated.
2 I lied.
3 Note that the “Assert” section may well have lines that aren’t just literal code Assertions – the code to fetch the checked records from the DB is also part of your Assert section.
4 Martin, R. C. (2008), Clean Code: A Handbook of Agile Software Craftsmanship , Prentice Hall , Upper Saddle River, NJ.