TDD & The Lump Of Coding Fallacy

Hey, it’s GeePaw, and if you’re just starting to look at TDD, refactoring, the modern technical synthesis, we need to start with a couple of minutes about the Lump Of Coding fallacy.

You’re a working geek: you spend your days coding for money to add value to your company. And one day some random schmoe like me comes up to you and says, hey friend you really ought to try TDD, because that value that you’re adding, you could have added even faster if you used TDD. So, you check it out, because sometimes random schmoes actually have a clue, and you see pretty quickly that TDD means making lots of automated tests against your code. So what do you do?

Well, you back away slowly and you casually slide your hand into your pocket to get that little reassuring grip on your pepper spray, just in case.

Why is that? Well, it’s pretty simple really.

Look, you say to me, I already spent all day rolling code to add value. But if I do TDD, I do automated tests, and try to keep up with me here, doing automated tests is also rolling code.

So, if I adopt TDD, I suppose you think I’ll just spend all night coding the tests that cover the value I spend all day adding. Look random schmo, in the immortal words of Sweet Brown Wilkins,

“Ain’t nobody got time for that.”

OK. OK, my random schmo response to this is to suggest that your analysis is off. And specifically, that it suffers from what we call the lump of coding fallacy. You think of what you do all day as a single behavior, coding. A large undifferentiated lump of work. And you think that TDD means adding to that one lump of coding a whole different equally large lump in order to make the test.

Three Things We Do During A Coding Day

Here’s the thing, I don’t think what you do all day is a single behavior. I think it’s made up of several different behaviors. Let’s take a look.

  • One thing we do, we actually program the computer, and there’s two parts to that. There’s actually changing the source and then there is what we call designing, which is imagining forward how we’re about to change the source. That’s programming the computer. For many of us, it’s the best part of our day.
  • Next, we study. It’s not possible to change the source without knowing something about it. And the studying is how we know something about it. Again, there’s a couple of different parts. We scan source, which is flickering quickly from place to place. And then we read source, which is far more intense. It’s more line by line, a deep analysis of what’s really going on in the code. So that’s study.
  • The third thing we do during a coding day is what we call GAK activity. GAK is an acronym. It means geek at keyboard. And what it means is, running the program in order to accomplish some end or another. In GAK inspection, we’re running the program to see how it works right now. In GAK testing on the other hand, we’re running the program again, but this time to see whether some source change that we just made had the desired effect. And of course, there’s always GAK debugging, where we’re running the program, this time in debug mode, or print statements, or whatever, to see why that source change did not have the desired effect.

Two Points For Later

Now, before we go any further, I want to make sure you know two key aspects of the TDD world. First, when you’ve adopted TDD every source file really becomes a pair of files. One holds the source code we ship, and the other holds the source code for the microtests. And we use both of these files all the time, because they both offer us a great deal of information. That testing code forms a kind of scaffolding around the shipping code and we’re going to take advantage of that.

Second, TDD uses very small, very fast tests called microtests, and it runs them separately from running your entire application. The reason we can get away with testing only parts of the app, is because what matters most about our work is the branching logic that’s in it. And that’s the part we test most heavily using microtests. We run them in a separate app for speed, selectability, and ease of use.

So, take those two points and set them aside. They’re going to become important here in a minute. Hold onto them.

The Proportions Of The Three Activities

OK, let’s go back to our three activities. So, take these three things together, changing codes, studying code, and GAK activity, and you see there isn’t just one solid lump called coding. There’s all these activities. Of course, they’re totally intermingled throughout the day. And that’s why we think of it as a big lump. The truth is, they actually take up very different proportions of our programming day.

Programming the computer, the best part of the day, is often the very smallest part. The GAK activity, much of which is just waiting around for things to run, or clicking through screens and typing in data in order to get to the part where you wanted to see something, that is the largest part of the day by quite a bit. And studying, the scanning and the reading, well, it’s somewhere in the middle. So those are your basic proportions.

When TDD tells you that writing automated tests will make your life better, the lump of coding analysis of the idea is both right and wrong. Lets grow this picture a little bit. We’ll call what we’ve got now before TDD, and then I’m going to disappear, and we’ll put after TDD over on the right.

The After Picture

The lump of coding fallacy is absolutely right about one thing, automated tests are more code that has to be written. Somewhere between half again as much and twice as much as you write now. Let’s say that part of our day doubles. On the other hand, the lump of coding fallacy is totally wrong about the rest of the picture.

First, study time will go down after TDD. It’s not that we have to study any less code in the after picture than in the before. Rather, it’s that studying the same amount of code gets faster. Why? Because those twin files we talked about, one with shipping code and one with testing code, it’s almost like the test code forms a kind of Cliff’s Notes for the shipping code. A scaffolding that makes it easier for us to study, and this makes it far easier to tell what’s going on. This will cut our code study time in about half.

Finally, we come to the GAK time, and this is the big payoff. TDD reduces the amount of time you spend in GAK by 80% or 90%. Because TDD tests run in that special tool kit. They’re fast. They don’t fire up your application. They don’t depend on things like logins, or database permissions, or waiting around for the web to load. They are built to be fast, small, and grouped into convenient suites. Nothing completely eliminates the need for GAK work, but TDD slashes the amount of time you spend GAK-ing during the course of the workday.

So, when we look at the left and the right, from before TDD to after it, you can see it for yourself. We write more code, automated tests. And far from losing our productivity, we actually gain it. All this takes is for you to look past that single lump of coding fallacy.

GeePaw Advises

OK, it’s time for my advice. The first advice is the same advice I always give. Notice things. In your next full working day, notice how much time you spend in the three behaviors, actual programming, code study, and various GAK activities. Once you see it yourself, you might want to consider making some changes.

The second thing, well, TDD doesn’t come in a day. It takes some lessons and some practice. There’s a lot of course material out there, including mine. Watch some videos. Read a little, and really try the various exercises. Start with some toy code. Almost anything will do. Then find a small problem in your day job that has few or no dependencies on other classes. Do this two or three times. And again, notice what happens. If you like the result, well, at that point, you’re ready to get serious about TDD. And then, well, we can take it from there.

So, I’m GeePaw.

Drop that Lump Of Coding fallacy,

and I’m done!

How Long (Technique Remix)

how long (redux)?

in the technical side of the modern synthesis, we develop code by writing and passing tests then re-working the code to make it as change-enabled as we can.

the key “how long” aspect to this: how long does the code stay not working? that is, how long in between the times we could push the code straight to production? the desiderata in the modern synthesis is that we try to measure that number on a scale using only two digits of minutes.

it’s as if we’re playing a game, and the object is to keep the application running correctly even as we are changing it. over time, as you get stronger and stronger at it, this becomes an obsession for the geek. it’s a great fun game, btw, and an old hand like me derives much pleasure from the challenges it presents. like many others, i not only like playing it for real in my codebase, i also enjoy playing in my imagination for other people’s code. 🙂

but of course, it’s not *there* just for the kicks it gives us. there are several benefits it provides for a team that seeks to ship more value faster.

  • first and most importantly, it narrows mental scope. recall that managing this, the number of conceptual balls you’re jugging in your mind, is a key part of being able to move quickly in code.
  • second, still valuable, it reinforces a deep commitment to “always shippable”. the drive for ways to change the code in small chunks without breaking the app frees us from complex source-branching strategies, and lets the value-defining teammates “turn on a dime”.
  • third, it has its own indirect effects on the code base. code that’s built this way over repeated applications of the process is almost insensibly becomes fully change-enabled in to the future.

anyway, the effects of working this way are profound, and to some extent they define the technical side of the modern synthesis. this lunch isn’t free, of course, it involves coming up with solutions to at least three different kinds of problem.

the first problem is controlling how the change is exposed to the user. largescale changes in code don’t happen in ten minutes. that will mean there’ll be a time when a given new feature or changed behavior is building in the codebase, but shouldn’t be visible to non-dev users.

in a pure “add” situation, there’s a pretty basic solution: some kind of feature or rollout toggle that makes the added functionality invisible in one state and available in the other.

pure adds are relatively rare, though. most of the time, practicing evolutionary design, the code we’re changing is spread through multiple locations in the code. here things become more interesting.

two of the original design patterns come in handy pretty frequently here. 1) the Strategy pattern makes it easier for me to supply ‘pluggable’ behavior in small units. 2) the Factory pattern lets client code magically get the *right* strategy depending on toggle values.

a caution: the road to hell is lined on either side with “one ring to rule them all” strip malls that will make you want to only ever use one technique for controlling user exposure. we know what lies at the end of that road. 🙂

the second problem is the certainty problem. how can we be certain we have not broken the app as we make our changes?

the toggle-ish solutions from before can aid us here, *if* we have very high confidence that the toggling itself is idempotent. that is, if putting the toggle mechanism in doesn’t *itself* break the app.

a preferable solution, usually faster, once you’ve set yourself up for microtesting and refactoring, is to bring the app’s business logic under a high degree of scrutiny with microtests.

microtests are exceptionally cheap & powerful in asserting “this is how it works now”. if the awkward parts of what’s being tested are isolated, we can microtest around considerable chunks of the code. they give us *tremendous* confidence that we can’t go blindly backwards.

(note: that’s not certainty. there is no certainty. that’s confidence, though, and confidence goes a long way.)

the third problem we have to solve is sequencing. how can we chain together a bunch of steps so that a) we get there eventually, and b) the outcome of each step doesn’t break the app and c) we can still follow a broadly coherent path?

sequencing is the meat and potatoes for the refactorer, and when we change code, we very often also have to refactor what’s already there.

sequencing very often involves inventing steps that are “lateral” rather than “towards”. what i mean is, effective sequencing often involves steps that don’t immediately aim at the endpoint.

an example: it’s not at all uncommon to do things like add a class to the code that you intend to remove from the code four steps down the line.

instead of twenty-five steps, all towards the endpoint, we take one side-step, four steps towards, and one more side-step at the end. fewer steps is better if they’re all the same size. (forthcoming video of an example, in my copious free time.)

learning the modern synthesis — the technical side at any rate — is very much about just learning the spirit of this approach and mastering the various techniques you need to move it from on-paper to in-practice. it’s not an instant fix. learning this takes time, energy, and great tolerance for mis-stepping. but the payoff is huge, and in some of these cases it can also be nearly immediate.

it all begins with this: “how long?”

how long will the code be not working?

pro-tip: shorter is better.

How Long?

how long?

what amount of time passes between you saving the file you just changed and you seeing the results of that change?

if that answer is over 10 seconds you might want to wonder if you can make it shorter. if that answer is over 100 seconds, please consider making a radical change in your approach.

thinking is the bottleneck, not typing, we’ve been over this a bunch. but waiting you can bypass — *all* waiting you can bypass — is waste.

i know firmware folks who routine wait 20 minutes from save to run. why? because they are running on the target, and that means sending the image to the target, stashing it in the NVRAM and rebooting. every time. wow, that must be some intricate h/w-dependent rocket science stuff they’re working, eh? and sometimes the answer is yes. but i’ll let you in on a secret. not usually. usually it’s logic that’s got almost nothing to do with the peculiarities of their h/w target. i’m not joking. http and ftp servers, things like scheduling — *calendar* scheduling, not multi-task scheduling.

before we all pile on, tho, maybe we better check ourselves on this.

  • do you have to bounce a server or a service inside a container to see your results? do you have to switch to a browser? do you have to login and step through three pages to get to the right place?
  • do you have to manually or automatedly clear the cache or wipe the database? do you have to go look up stuff in a special place?
  • do you have to edit a config file, or just quick like a bunny bounce to the shell and type in those same 87 characters again (except for the three in the middle)?

hmmmmmm.

and checking the results. do you do that in the browser, too? or maybe you study the log output. or, again, bounce to the shell and tweak 3 characters of the 87. of course, maybe you don’t check the results at all, one does see that from time to time. “it ran, so it worked.”

if i don’t know whether my change did what i wanted in 10 seconds i get grouchy. sometimes i suffer it for an hour, cuz the change i’m making is the one that will free me to get the faster feedback. but generally, long wait-states between change & results turn me into Bad GeePaw.

the answer, nearly every time, is to use one of several techniques to move your code-you-want-to-test from a place where that takes time to a place where that’s instant. sometimes doing this is a heavy investment the first time. most non-tdd’ed code is structured in one of several ways that make this difficult. demeter wildness, intermixing awkward collaboration with basic business logic, god objects, dead guy platonic forms, over-generalization, primitive obsession, all of these work against readily pulling your code apart for testing it.

the trick is always one step at a time. remember the conversation the other day about supplier and supplied? start there.

you’re either adding a new function or changing an old one, yeah? if you’re greenfield in your tiny local function’s context, try this: put it in a new object that only does that tiny thing. it sits by itself. declare one in a test and call it. lightning speed.

if you’re brownfield, you can do the same thing, but it’s harder. the key is to first extract just exactly as much of the brown field code you need to its own method. then generally pass it arguments rather than using fields. now rework it for supplier/supplied changes. finally, once again, pull it to a new class. note: you don’t have to change the existing API to do this. that API can new the object and call it, or if it’s of broad use, new it in the calling site’s constructor.

the hardest problem you’ll meet: high fan-out methods or classes. these are classes that depend directly on every other package in your universe. for these, well, don’t start there. 🙂

legacy rework is not for sissy’s.

My TDD Isn’t Judging Your Responsibility

An old blog of mine, We’re In This For The Money, recently got some attention, delighting me, of course. A full h/t to Jim Speaker @jspeaker for that! He quoted me thus:


Too busy to #TDD?
Too busy to pair?
Too busy to #refactor?
Too busy to micro-test?

I call bullshit. My buddy Erik Meade calls it Stupid Busy, and I think that nails it pretty well. All you’re really saying is that you’re too busy to go faster.

~@GeePawHill


Most of the responses were RT’s and likes, both lending me fresh energy. A modest number of folks chatted with me, and that gives even more juice. (Readers please note: the sharers of these things you enjoy need to hear that you enjoy them, early and often. It is what gives us the will to share.)

I Believe What I Said…

Since that article, from 2014, I’ve reworked the ideas in it many times. I have developed theory to back it up, experiments by which someone can prove it to themselves, and a great deal of teaching material. I have written and spoken repeatedly on the topic of shipping more value faster. It’s part of almost everything I write about the modern software development synthesis. About the best recent statement, though I am still formulating new takes on it all the time, is in the video (w/transcript) Five Underplayed Premises Of TDD.

If you want to ship more value faster, you might want to consider TDD, as it will very likely help you do that, for nearly any definition of value you wish, provided only that the value involves writing logically branching code. I genuinely believe that the modern synthesis is really the fastest way to ship value that we have at this time.

…But I Don’t Agree With Everyone Who Believes What I Said

And yet. I can easily see how one might infer from that quote — even from that whole blog, or from other parts of my output — that I am suggesting that using the full modern synthesis is a responsibility for a professional developer.  Or to reverse the sense, that I am saying it is irresponsible to be a working programmer who works in another way.

Some who responded took me as saying that and opposed that idea, others took me as saying that and favored that idea. I’ve no idea what the RT’s and likes were inferring.

I am not saying that it is irresponsible of a working programmer to not do TDD et al, because I do not believe that.

Frankly, the idea gives me the willies. So I feel it’s incumbent on me to give you some detail about what I would actually say about this question of responsibility.

(Writers don’t get to decide what readers make of their ideas. You’ll take the money premise and all the rest of my output you decide to bother with, and you’ll make of it what you will. All the same, if can see how you might think I was saying that, and if I disagree strongly with that, I feel my own — normal, individual, human — sense of responsibility to do what I can to make my ideas more transparent.)

Near-Universal, Individual, And Fraught With Internal Conflict

Nearly everyone develops a sense of what they are obliged, by their own description of who and what they are, to do in this world. We normally call this the sense of responsibility. For most of us, the twin pillars of that sense are 1) our deep beliefs about what is morally right or wrong, 2) our deep connections in the many networks of humans in which we find community.

Those senses of responsibility are rich, vague, staggeringly complex, and full of outliers, odd twists, preferences, and, when put in to language,  startling contradictions.

Above all, they are individual. There are aggregate and statistical trends, of course, and if one could find an N-dimensional space big enough to hold the variables, one would see bell-curve like manifolds all through it. I have a sense of responsibility, and the odds are overwhelming that you do, too. Our respective responsibility-senses may appear to be similar, even highly similar. But they are not. It only appears that way because of the limits of language and the locality of the context we use to frame them. Our responsibility-senses are different because we are different.

If you read widely, you will have seen hundreds, maybe thousands, of situations in which one responsibility a person senses is pitted against another responsibility that same person senses. If you don’t read at all but you are old and thoughtful enough to seriously review your past, you will also have seen hundreds, maybe thousands, of situations like this in your own personal history, and that of others you know well.

Cutting To The Chase

Boy oh boy was I ever about to sprawl out and produce a nine-volume treatise on ethics and responsibility. From my simple dog-like goodness, I will spare us all. Instead, I will sum up my beliefs:

  • that the modern synthesis is the best current way we have of shipping more value faster, whenever that value depends on logically branching code with lifetime longer than half a day.
  • that I don’t honor my own sense of responsibility at all times, because I am weak or foolish, or because I carry conflicting senses simultaneously.
  • that it is not for me to say what is responsible or irresponsible — global words — about any local context involving people who are strangers to me.
  • that “doing your job well” is very often not dependent on shipping more value faster.
  • that “doing your job well” can mean many things that are not good for you or the company you work for.
  • that “doing your job well” is in any case not particularly more important to me than any number of other responsibilities I carry.

If you are a working programmer, you are a grown-up.

You are in the difficult position of balancing dozens of competing needs, using only your heart and body and mind, and I am well aware that this is a task far more demanding in real life than it seems, on paper, in a blog, or spoken by an idol. I won’t do that for you because I can’t do that for you. I only sometimes manage to do it for me.

In One (Long) Sentence

Please don’t take me, even in a crabby mood, as someone who assesses your sense of responsibility, commitment, or heaven forbid, morality, on the basis of whether or not you do TDD or any other part of the modern synthesis in your job.