TDD Pro-Tip:
I stay very aware of my testing context’s data, and specifically of what data is opaque and what data is transparent.
Those terms, transparent and opaque, need a little explanation. Sometimes the code I’m testing doesn’t vary based on the entirety of its input.
A trivial example, the function that validates the date of an order DOES NOT CARE what any other field in that order is or does. It only cares about the date field. I would say Date is transparent to that function and the other fields and subfields are opaque.
This kind of thing happens a lot. When you test a render by typing a random name and description into it, you are implicitly recognizing that the name and the description are opaque to the renderer: it doesn’t care what those are, nor do you, you care whether they appear.
When I’m testing code like that, I use bogus values for the irrelevant fields, values that are enough to get the Order constructed, but are not intended to simulate the real values — other than that date, of course.
Why bogus values, why not simulated ones? Because I am trying to make the noise look like noise and the signal look like signal.
An example: this test starts with
val extract = Extract("date", "market", "region", "deliveryPoint","bid", "offer", "type", "startDate", "endDate").
It then invokes the serializer on that extract and asserts that the roundtrip serialized Extract has same values.
Not one of those values in that test resemble the real working values except for being a string. Because serialization doesn’t use that data except for knowing that it’s a string.
I also will use strings like "dontcare", tho only if there’s one of them. With numerics, I’ll tend to make sure my numerics are relatively prime, but otherwise any old number will do.
(A caution: never use obscene language in this way. 🙂 There are stories around that that I am not prepared to share at this time, but trust me, it’s a bad idea.)
I use the same technique for more complex data structures, tho in those cases I prefer to have a builder laying around to help me. We need another muse with code to help us talk about building data, so that’s for another day.
Sometimes, my test needs a known bad value. If my code is spozed to freak when there’s not exactly five ,-delimited tokens, you’ll see me use "not,enough,tokens" and "more,tokens,than,there,should,be" and "this,has,five,tokens,exactly".
With things that aren’t strings, I often introduce test-level constants I can use. It’s not obvious that "Feb-18" isn’t legal in the current context, but it’s very obvious that the variable startsTooSoon isn’t.
I spoze what this is all about at base is me having learned that my tests are performing more function than just proving that my shipping code does the thing: they are part of the communicative intentional shared text aspect of the source base.
I could use legit values, maybe even values captured from previous runs, exactly, and I’d still prove to myself that the shipping code does the thing. What I wouldn’t be doing, though, is making my intention as a coder nearly as obvious to the world.
I will even go so far as to use the exact values from a live run and then go back during the refactor and munge the irrelevant ones.
Knowing which parts of test data are relevant and which parts aren’t is a big deal for me. I call the relevant ones transparent and the others opaque, and I use garbage data for the opaque ones. This is a tiny way to increase the signal.
Have a lovely Thursday evening!