A Simple Optimization Problem
Hey, it’s GeePaw!
Got a little optimization issue for you. Let’s look at some code. It’s easy stuff, so I’ve just used pseudocode here. Take a look.
The program starts by entering a loop that runs 50 times. Inside that loop, it calls the Read method. It then enters another nested loop, this time for 100 times. And in that inner loop, it calls the Scan method. Finally, we exit both loops, and we call the Write method. And then, the program is over.
Now, each of those three methods — Read, Scan, and Write — eats about 100 milliseconds per call. The total run time for this particular app is a little bit under eight and a half minutes or so.
Now, suppose we were setting out to improve that performance. We wanted to take much less than eight minutes.
Given what we’ve learned just so far, and nothing else, let me ask you this — where would you start? Would you investigate the Read method, the Scan method, or the Write method first?
I didn’t think this puzzle was going to give you much trouble. Of course, you’d go charging right after that Scan method, wouldn’t you?
Our Rationale
I know. Most of you got that instantly. But in case, you know, you were distracted and had to look away from the screen, let’s explain the logic. See, the Scan method is called way more times than the Write method, 4,999 times more, in fact.
Suppose we went to the Write method and shaved off 10 milliseconds of its run time. How much overall time would we have saved? 10 milliseconds. Write’s only called one time. Any savings we get by fixing Write, we only get one time.
If we shaved 10 milliseconds off the Scan method, we’d save almost 50 whole seconds of runtime — not quite a minute — in one shaving. Because that 10 milliseconds isn’t saved once, it’s saved 5,000 times. Of course, Read is somewhere in the mix, too, with only 50.
So our priorities would be to squeeze hard first on the Scan, then on the Read, and finally on the Write.
So you got a simple optimization problem. Now, I’m going to tweak this just a little bit, and I’m going to repeat it.
Optimizing Programming?
This time, instead of a program, let’s talk about programming. In programming, we scan code, we read code, and we write code. And it’s my absolute conviction we do way more scanning than reading. And we do a lot more reading than writing.
Of course, it’s not linear or mathy. It’s, after all, work going on in our flickery, mind-moving heads. But the linearity, remember, isn’t what matters. What matters is the relative proportion. If we are to optimize programming, instead of just a program, we need to start by optimizing for the scanning operation.
What’s Scanning?
What, even, is scanning? Well, it’s a kind of high-speed, tight-filtered way of seeing. The two properties, speed and filter, can’t be divorced from one another. It’s fast because it’s filtered. It’s filtered so it’ll be fast.
The filters can be different. Sometimes, we’re scanning a whole package of some kind, just scrolling quickly through the code. We’re filtering for the skeleton, the bones of the thing, like an X-ray. At other times, we’re bouncing between packages, and our filter is in the shape of a simple question whose answer we need for whatever next step we’re doing inside our main thread.
Now, did you ever try Where’s Waldo? Or, if you’re not in North America, you might know it as Where’s Wally? Waldo is that doofus with the red-and-white-striped shirt and stocking cap, and he’s got the distinctive round glasses. He’s usually waving. He’s hidden somewhere in a rich visual forest of people and things, many of which are themselves possessed of red and white stripes and little round things that might, in fact, be glasses.
When you’re solving a Where’s Waldo?, you are scanning.
Your mind becomes a Waldo detector. It makes a Waldo filter on the fly, and it’s jumping around the picture in quest of something– anything– that triggers that filter.
When the filter lights up, we focus tightly and quickly decide whether we have the real Waldo or not. Waldo or Wally puzzles are actually quite a bit of fun. True story, while prepping this video, I did, in fact, spend about two hours solving Waldo puzzles on my web screen, and I had a great time when I should have been working on the video.
Martin Handford, the creator of Waldo, is a master of creating things that are hard to scan. He’s the anti-scanner. But when we jump back to our optimization problem, that’s the exact opposite direction that we want to go in. That direction, the opposite of a Waldo puzzle, is what we call "optimizing for code scanning."
Optimizing For Scanning
So how do we optimize for scanning? Well, the three most important aspects of easy scanning are sizing, grouping, and naming. And these factors aren’t really separable. Each one relates to and drives the others, and no one of them is quite good enough to solve the problem all by itself. If that seems really complicated, it’s because we’re talking, here, about design. Design is just about the most complicated thing a person can do.
OK, the first thing about a Waldo puzzle is its size. Each puzzle is a double-page spread in a large book. Off the top of my head, I’m guessing there’s something like 80 by 40 possible places Waldo could fit. The sheer number of things you have to move your filter over is enormous.
Now, if the pages were smaller, there would be fewer places to look for Waldo. Ha! Easy peasy, right? Sizing is exactly as simple as that. And it’s exactly as difficult as that, too.
Well, think about it. Making a page smaller means making more pages. If we just published a Waldo puzzle by drawing a grid on top of the real one and splitting it up into little, tiny, arbitrary subpages we wouldn’t really have improved our performance much, if at all.
No. The pages have to be grouped. Grouping is dividing the pages along lines that, in some way, make sense. Grouping means putting related things on the same page and unrelated things on a different page.
And, of course, that’s going to lead us, in turn, right to naming. If all the things aren’t all on the same page, we need ways to clue us in about what things are on which page. Those clues are, overwhelmingly, the names that we give.
So we’ve got sizing the pages to make them smaller. We have grouping of the things that fit on a page so that closely-related things are together. And we have naming of the pages to give us the clues to help us get to the right page as quickly as possible.
Easy Scannability *Is* Design
The problem of making things easier to scan is the basic design problem. And look, there’s no way that’s going to fit in a dumb old GeePaw. video. In the meantime, though, let me offer you two pieces of advice, just to get you started.
First, notice. The next time you’re geeking out on a problem, watch yourself work. You will find yourself scanning. You’ll know exactly what we’ve been talking about here. You’ll scan and scan and scan, bouncing here and there, taking things in at a glance.
In particular, notice the times when a scan fails, when you intended to check something really quick and get back to your main work, but you don’t find it. Just notice this. And then think a little bit about what it means.
The second thing, change things. Do you know how Google came to be so effective as a scanning engine? Because that’s what it is. It’s a scanning engine. By changing its page rankings based on detecting when the scan worked well versus when it didn’t work. They changed their organization. When scanning doesn’t work, change the thing being scanned until scanning works better.
We have a name for this, by the way. We call it refactoring. We’re changing what the code says without changing what it does. When you refactor, you’re actually optimizing for scanability.
As programmers trying to ship more value faster, we want to optimize first around the part of the work that we spend the most time on, and that is scanning code– formulating little, tiny questions and flipping through the code looking for the answers. We optimize for scanning by focusing our attention on sizing, grouping, and, above all else, naming.
So there you have it. Now, get back to work. Well, I mean, as soon as you’re done looking at Where’s Waldo? puzzles on the web, get back to work. Go back out there, start noticing yourself scanning, evaluating how it goes, and changing the code to make it easier for yourself. I’m GeePaw, and I’m done.
Pingback: Readability And Scannability | GeePawHill.org
Amazing! this is probably the best explanation of abstraction I’ve heard so far (group things and give them a name). In my experience constraining the size of a method leads to abstraction. I’ve somehow had figured this out, but the way you explain it here is brilliant! Keep it up!