Rethinking the world... and making it a better place: September 2011

26 September 2011

Feature-driven, design-guided, and tests-in.

I just typed an answer to a new comment on my last post. Apparently it became to long to be submitted as a comment itself, so here I am turning it into a new post. I starts out with some "loud thinking" but ends with some nice insights.

Hi James, thanks a lot for your comment. You're asking just the right questions and those questions help me see more clearly, what "my problem" with TDD is.

Now, ten days after I wrote that post and through your comment, I realize that there's actually a big gap in how TDD is summarized (especially Uncle Bob's version with the three laws) and in how TDD is actually successfully practiced. I find that when the three rules are taken literally (we tried that in some Dojos), then the development and actual design becomes cluttered with detail of each test case and it's less feature-driven as well as less pattern-guided than I would like. On the other hand, if I'm looking at successful agile development with lots of unit tests, then the three rules are just not visible in the process.
I think that maybe two social processes are at work here: on the one hand, good practices spread thru pair programming and people reading a lot of open-source code, but those practices often don't have catchy names. On the other hand, there's a very catchy concept called TDD and very simple "three rules" and people saying that just by following those rules and refactoring, everything else will follow. For example, some people say that good design automatically follows from testability because only loosely coupled systems are easily testable.

So, the reason I wrote this blog post is that the simple, catchy way, TDD is explained, just won't work. It's also just not true that TDD gives you an easy way to tell when you're done. I am currently working on a medium-complex system (roughly developed by a four-person team over two years) with high unit and integration test coverage and we repeatedly had incidents just because we forgot to add something here or there which didn't get caught by the tests. However, our code is simple enuf that those missing parts would become obvious if we just had a final code review after every iteration where we check all production and test code against (a longish) list of the specific level of done for the project. (Which includes error handling, logging, monitoring, etc.) That review is what we now regularly do. Sometimes we find missing things in tests, sometimes we find them in the source, in each case it's easily fixed before going live. So the seemingly obvious things like "TDD always gives you 100% coverage" or "with TDD you always know when you're done" are just not relevant in practice.

My conclusion after working in a "high unit-test coverage" project are that not tests should come first, but the design of very small parts (a method or a small class) should be first instead. The design is primarily guided by the user (caller) of that unit. Design is always finding a sweet spot between a desired feature on the one hand and technical considerations just as available technologies, efficiency, and -of course- testability, on the other hand. I don't think it matters whether you write the implementation (of a small unit) or its tests first as long as you get all tests to pass before you tackle the next unit. (Personally I prefer implementing it first, because the implementation often is a more holistic description of the problem. Only for complex algorithms (which I find to be rather rare), writing tests first seems to give a better start at properly understanding the problem.) By starting with the design (which most often is an interface specification), I find it much easier to think about the method or class in a holistic fashion and also figure out a set of test cases that's small yet covers everything I need. Would you say that this process is still TDD?

Fast tests with high coverage are very important to me, not least because refactoring is very important to me. But I don't like the term "test-driven" because the driver of development is always some external (non-technical) need, such as a feature or some resource-restriction ("make it faster"). Tests are just a technical tool (albeit an important one) and it's the design that creates interfaces which both fulfill customer needs and technical standards. I think of my development rather as "Feature-driven", "design-guided", and last not least "integrated-testing" (because tests are an integral part of the code). Maybe the term "tests-in" is more catchy? As long it isn't "driven...". After all, model-driven also didn't work that well... ;-)

15 September 2011

How to write good software and why baby-step TDD is a scam

First off, I am obviously not going to tell you all about writing good software in a single blog post about TDD. Writing good software takes a lot of learning and a lot of practice. There have been countless books written on the subject and since this post isn't about a book list for software engineers either, I'll just mention one to give you an idea: Object-oriented software construction by Bertrand Meyer.
The company I work at has quite a large software development department and quite a good leadership for the latter. Our managers promote autonomy (developers choose the technologies and methods they think are best suited for the work) and learning on and off the job. For example, we have regular (voluntary) coding dojos (practice sessions) where a bunch of developers sits together to solve some simple problems with some new approaches. This is certainly an important part of writing good software.
Recently, we experimented with Test-Driven-Development (TDD), which some people also read as Test-Driven-Design. TDD as my colleagues introduced it to the rest of us is based on the following three rules:

You are not allowed to write any production code unless it is to make a failing unit test pass.
You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
You are not allowed to write any more production code than is sufficient to pass the one failing unit test.

(Something most proponents of TDD would add is a fourth step to refactor the code while the tests are green, but when TDD is introduced and defined this step is usually not mentioned.)
Our company DOJOs and some reflection upon them have taught me that this is plain bullshit and here's why. In the last two decades, the profession of software development has embraced methods like automated (unit and integration) testing, iterative development, early testing (also called "tests first"), merciless refactoring, design patterns, automated builds and many more. All of those practices are great if done right. Now TDD comes along and claims to condense many of them together into an integrated framework based on the above rules. Going back and forth between tests and code is obviously iterative. Tests obviously have to be automated. You obviously need refactoring, because otherwise TDD will produce terrible code. So TDD dresses itself up as the natural evolution of agile development. But the truth is: TDD is a perversion of agile which over-applies agile principles in a way that doesn't make any sense any more.
Somebody who claims to do TDD either doesn't follow the three rules above or they're doing helplessly bad development. TDD is a scam because it contributes nothing new to the set of agile practices. If someone using “TDD” succeeds writing good code, it is due to the other agile practices, not due to the three rules above. TDD even obscures and ignores a lot of other important methods. SCRUM, for example, tells us to define minimal features and implement them including production code, automated tests, and all that's needed to deploy and run the feature live. SCRUM offers a lot of advice on what a minimal feature is, how to split stories and what's small enuf not to need any further splitting. TDD, on the other hand, splits iterations too much, ignoring SCRUM's advice. Design by Contract tells us how to write minimal interfaces by considering both the needs of the client and the provider and describing the interface succinctly in code. TDD, on the other hand, says that interface should emerge while they instead drown into a plethora of special cases. Finally, testing methods teach us how to design good (and minimal) test cases, get good coverage, and test most where it is needed most. TDD, on the other hand, says nothing about where you start, how to continue, or when you are done. Tests are always green, but when do you have enuf tests?
Think about that: there have been countless example demos of TDD on the internet, on conferences, in practice sessions, but have you ever even seen a small program development finished with TDD? To the contrary, the only thing I see are epic failures. (Thanks, Fred, for the great link!)
So, can we please forget about this exaggerated baby-step TDD, stick to established best practices, and move on writing good software?

Addendum, months later: I saw a good example of TDD in Freeman & Pryce's book "Growing Object-Oriented Software". Their interpretation is much better than the baby-step TDD seen in blogs. The book starts by summarizing established best practice OO design. Their example study is much more elaborate and the problem domain is actually related to the kind of software that professional Java developers are writing for money. If you want to know about the real thing, you have to take the time to read something longer than a couple blog posts.

13 September 2011

Refactoring examples: little steps and big smells

My friendly coworker shared a video of Uncle Bob live refactoring some code. Since I love refactoring I was very excited to watch it, but a few minutes into the video my excitement turned into horror, disappointment and anger. Uncle Bob refactors a piece of smelly code, but instead of removing the cause of complexity (namely too many things being done at once), he just spreads the complexity out into many different methods which communicate with each other via member variables. The result looks cleaner and certainly has good naming and short methods, but it still has way too much complexity. And what's worse, with everything spread out in so many pieces, it's much harder to refactor to really simplify it to the core. And what's the worst of worst: even forty years after the invention of such useful principles as "command-query-separation", "separation-of-concerns", and functional programming, Uncle Bob happily violates all those great principles to clumsily cultivate complexity and call the result "Clean Code" and sell it for money. Skip the jump to see the code, good and bad.

5 September 2011

Cleaner Code

My team of software developers at work has decided (with some consultation by our team leader) to have a biweekly gathering to discuss a chapter of “Clean Code”. I am on vacation just now and had to miss the first meeting, but I am just reading the book on the train home and here's a little insight I want to share. I am talking about the last example of Chapter 2 in the section "Add meaningful context".

I think that the general strategy of giving a bunch of variables a context by putting them in a separate class is good, so I don't object with the point of the book.
However, I also think that this particular example can be improved in another way, which gets rid of the variables altogether by making the code simpler and shorter.

First of all, the naming of the method is wrong. Most of it is concerned with formatting the GuessStatistics, so I'd rename it "formatGuessStatistics" and refactor the call to print out to the calling method. This will also rid us of the dependency to however the statistics are printed.

Now, let's recognize that the method actually does two things: first, recognize the plural which is applied to all numbers but "1" and results in a different verb and plural "s", and second, replace the number "0" with the word "no". Instead of flattening those two choices into three cases, we should seperate the concerns.

private String formatGuessStats(char candidate, int count) {
    final String number = count==0 ? "no" : Integer.toString(count);
    if (count == 1) {
        return String.format("There is 1 %s", candidate);
    } else {
        return String.format("There are %s %ss", number, candidate);
    }
}

Maybe you'll think that I introduced bad redundancy by repeating the word "There ". I, however, think that such a little bit of redundancy is of no harm, especially since in this case it helps us remove abstraction and see more directly what the code is doing. I also think that the redundancy is only accidental a mirrors redundancy in the English language to which we convert here. If, for example, our PO decides that the singular case should read "There's" instead of "There is", our simplified (yet redundant) variant will be a bit easier to change.

Now, let's look at some further minor improvements of this code. Observing that the "number" variable is only used in the second part, we can move it down into the else block.

private String formatGuessStats(char candidate, int count) {
    if (count == 1) {
        return String.format("There is 1 %s", candidate);
    } else { 
        final String number = count==0 ? "no" : Integer.toString(count);
        return String.format("There are %s %ss", number, candidate);
    }
}

Also we could simplify some more and use the handy "%d" instead of the wordy "Integer.toString". If you are tempted to add a comment to the else-block saying something like "// handle plural case", you can as well factor it out to a second method.

private String formatGuessStats(char candidate, int count) {
    if (count == 1) {
        return String.format("There is 1 %s", candidate);
    } else { 
        return formatPluralGuessStats(candidate, count); 
    }
}

private String formatPluralGuessStats(char candidate, int count)  {
    if (count == 0) {
        return String.format("There are no %ss", candidate);
    } else {
        return String.format("There are %d %ss", count, candidate);
    }
}

Incidentally, this leaves us with code that doesn't contain any local variables any more at all. Given that it is so simple now, we could go back to using just one method and sort the cases in increasing order of "count":

private String formatGuessStats(char candidate, int count) {
    if (count == 0) {
        return String.format("There are no%ss", candidate);
    } else if (count == 1) {
        return String.format("There is 1 %s", candidate);
    } else { 
        return String.format("There are %d %ss", count, candidate);
    }
}

Admittedly we now have reintroduced the three cases from the original code. But isn't it so much more direct and clear?

Which variant do you prefer? The original, the final, or any of the intermediate ones?

PS: When continuing to read the book, I found that some of the principles I used in doing this refactoring are also introduced in the book. Apparently not all of the examples used comply with all the rules given. In particular I got very upset about the use of a parameter for output in a later example and went on to write a long rant about why this is bad and how it can be avoided. Two chapters later, Uncle Bob himself states that this is bad and gave the same alternative techniques on how to avoid the problem. I guess this means that at least Uncle Bob agrees with my own principles of coding... PPS: Bloggers new composition interface almost doesn't suck anymore. Good job, guys! Keep it up!

Rethinking the world... and making it a better place