Pro-Tip (Refactoring): Prioritize reading the code over learning about the application domain.
Not so long ago, I worked with a team that was hacking a humongous (>3mloc) perl codebase with their main focus being performance. I set out to show them that the crazy crap I was showing them about TDD and refactoring would be useful.
The critical call was to a method that had to update patient status using a variety of pretty complex logic. We’d brought just that chunk under about 20% test the week before, so I forged in boldly.
At one point, we got a particular subset of the data. We then passed the IDs of the three records that formed that data. The code we called called other code, and other, and other. And at every level the entire subset of data was re-fetched using those three IDs.
(It was cached, but I found one common path in the code that hit that cache for those values 300 times.) So, hardly rocket science, I refactored to pass the entire dataset instead of just the three fields needed to reconstruct the dataset over and over again.
My team was troubled. How did I know it was okay to re-use those prior data reads? This led to a violent riot around the relevant domain concepts, and when and how they could change. There were angry placards, tear gas, face paint, you might have seen it on CNN.
It was great.
So how did I know? Welllllllll, because I read the code.
Riots like these — sometimes they just happen in one person’s head as she’s working — are commonplace. They are the result of prioritizing our understanding of the domain over our just reading what the code does.
See, it doesn’t matter whether those reads were safe to be re-used in all possible contexts in the domain, because I could prove that they were being re-used in all possible contexts in the code.
If that was a defective strategy in domain terms, well, it was a defective strategy both before I started and after I finished. The only difference was that my defective strategy had 300-odd fewer cross-process cache-hits in it per call.
The code we’re operating on when we refactor exists in a context. That context is made up entirely of all the code that calls it and all the code it calls. And we can — we need to — put a period at the end of that statement.
I use a lot of early returns in my code. I will argue some other day about why that’s a very good thing, in spite of the hangover the trade still has from the structured design movement.
My point here, though, is just this: whether you prefer early returns or a string of endif-tokens at the end of your method, going back and forth between requires absolutely zero knowledge of the application domain.
And the larger issue is that all the low-hanging squirrels in your code base have the same property: they can be used freely without the slightest knowledge of domain. They require only a knowledge of a) programming language cleverness, and b) entirely context-free code-reading.
(I call out programming language cleverness. An example would be re-ordering two mutating methods in a lazy-evaluating conditional in your language. One does have to know about such things in one’s language.)
When I am refactoring, I am changing the code without changing what it does. If I am changing the code and changing what it does, that’s something else, it’s not refactoring. (It’s not evil, either, it’s just not refactoring. I’ll come to that at the close.)
It stands to reason, then, that when I am refactoring the only thing that matters is what the code does. It doesn’t matter whether the code is "right" in domain terms.
Whatever wrongness is in it when I start, it’s gonna be in it when I finish, and that is A Good Thing[tm]. Why? Because it is one more trick to narrowing the mental bandwidth required to change code. Our bandwidth limit is the single most severe constraint there is in geekery.
You remember that I regard a lot of what I do as a geek as being a professional translator, from the schmooshy domain of humans to the deterministic domain of computers? This role, as translator, can make it hard to be a good refactorer.
Those twin domains, the human one of the overall app, and the computer one of just the code we have, are naturally intertwingled in the mind of a geek, and she couldn’t be a successful one if she didn’t have the knack for living in both at the same time.
But when you’re refactoring, you have to suppress your knowledge of the domain and focus on just what is in the code right now, what calls it, what it calls, and the logic and calculations it performs inline. That is the secret of great refactorers.
Okay, one loose end to tie up, and we’re out of here. When you change code w/o changing what it does, you’re refactoring. When you change code while changing what it does, you’re "!refactoring".
I said before, !refactoring is hardly evil: it’s actually what they pay us for, changing code and changing what it does at the same time.
Sometimes I am just and only refactoring. I do this whenever I can, because it narrows my problem. Other times I’m !refactoring. I do this whenever I can, because it makes my daily bread. How we mix these two modes can make all the difference in the world.
I resolve it this way: I preferentially refactor before !refactoring. That is, when I have a task in front of me that is definitely going to involve !refactoring, I first try to do all the refactoring I can do, and only then do I start changing what the code actually does.
That’s a whole ‘nother muse or many, but I wanted to resolve the tension and throw in some foreshadowing.
When I’m refactoring, I do my best work when I am entirely domain-blind, and am working with and thinking of all and only what the code that is actually there actually does. Prioritize reading code over studying the application domain.
Have a pleasantly odd Sunday!