13 September 2011

Refactoring examples: little steps and big smells

My friendly coworker shared a video of Uncle Bob live refactoring some code. Since I love refactoring I was very excited to watch it, but a few minutes into the video my excitement turned into horror, disappointment and anger. Uncle Bob refactors a piece of smelly code, but instead of removing the cause of complexity (namely too many things being done at once), he just spreads the complexity out into many different methods which communicate with each other via member variables. The result looks cleaner and certainly has good naming and short methods, but it still has way too much complexity. And what's worse, with everything spread out in so many pieces, it's much harder to refactor to really simplify it to the core. And what's the worst of worst: even forty years after the invention of such useful principles as "command-query-separation", "separation-of-concerns", and functional programming, Uncle Bob happily violates all those great principles to clumsily cultivate complexity and call the result "Clean Code" and sell it for money. Skip the jump to see the code, good and bad. Here's the code in question (taken from FitNesse Git repo, slightly different from the video):
  private boolean addChildHtml(StringBuffer buffer, int depth) {
    boolean addedTag = false;
    boolean lastAddedWasNonTag = false;
    int i = 0;
    for (HtmlElement element : childTags) {
      if (element instanceof HtmlTag) {
        if ((i == 0 || lastAddedWasNonTag) && !isInline)
          buffer.append(endl);
        buffer.append(((HtmlTag) element).html(depth + 1));
        addedTag = true;
        lastAddedWasNonTag = false;
      } else {
        buffer.append(element.html());
        lastAddedWasNonTag = true;
      }
      i++;
    }

    return addedTag;
  }
Uncle Bob repeated uses the automatic refactorings "extract method" and "convert local variable to member" to dissect this function. But that's entirely the wrong way! The existing mess of nested blocks and local mutable variables is just lifted on a different level of methods and member variables. Those refactorings are so mechanic that they could be done almost entirely automatically – and they add nothing to simplify and clarify the code. (Except for the additional names which are good – but with better structure, we'll get even better names.) Looking at the above method, we first note the violation of command-query-separation. The method at once modifies the "buffer" argument and returns a boolean result. As Bob states in his own book it's bad style to modify arguments and even worse to modify one thing and compute another. So the first refactoring should pull those two concerns apart. To do this, we'll copy the method, change the signatures of the two clones and then delete lines of code in each which belong to the other. Of course, we'll have to refactor the clients, too, and that will be made easy by the fact that one method will be a simply query that clients can call as often as needed without any side-effects. Here's the result of the first refactoring:
  private void addChildHtml(StringBuffer buffer, int depth) {
    boolean lastAddedWasNonTag = false;
    int i = 0;
    for (HtmlElement element : childTags) {
      if (element instanceof HtmlTag) {
        if ((i == 0 || lastAddedWasNonTag) && !isInline)
          buffer.append(endl);
        buffer.append(((HtmlTag) element).html(depth + 1));
        lastAddedWasNonTag = false;
      } else {
        buffer.append(element.html());
        lastAddedWasNonTag = true;
      }
      i++;
    }
  }

  private boolean childrenContainTag() {
    boolean result = false;
    for (HtmlElement element : childTags) {
      if (element instanceof HtmlTag) {
        result = true;
      }
    }
    return result;
  }
Here's how to adapt the client code from old:
boolean tagWasAdded = addChildHtml(buffer, depth);
if (tagWasAdded && !isInline) addTabs(depth, buffer);
to new:
addChildHtml(buffer, depth);
if (childrenContainTag() && !isInline) addTabs(depth, buffer);
As you see, command-query-separation even saves us another local variable! Now, the new method "childrenContainTag" is already pretty minimal (never mind that it would be just one line of vastly more readable code if java had first-order functions), so let's continue refactoring "addChildHtml". In the middle of the loop, there's an "if" that does three different things:
  • set lastAddedWasNonTag
  • call slightly different variants of element.html(..)
  • append an endl just in a special case of a tag.
We can separate all three concerns:
  • set lastAddedWasNonTag to a straight-forward expression
  • generalize the interface of HtmlElement to just one variant of the html(..) method
  • then the conditional will only be there for adding the optional endl
The result:
  private void addChildHtml(StringBuffer buffer, int depth) {
    boolean lastAddedWasNonTag = false;
    int i = 0;
    for (HtmlElement element : childTags) {
      if (element instanceof HtmlTag && (i == 0 || lastAddedWasNonTag) && !isInline) {
          buffer.append(endl);
      }
      lastAddedWasNonTag = !(element instanceof HtmlTag);
      buffer.append(element.html(depth + 1));
      i++;
    }
  }
The changes to the HtmlElement class hierarchy to support this refactoring are really easy to make and I omit them here for brevity. (I did it in Eclipse and all unit tests still pass.) Our next move will be to look at local variable i. it's only used in one place and only in disjunction (logical or) with lastAddedWasNonTag. In fact, we can see that i==0 only happens in the first round of the loop where lastAddedWasNonTag has its initial value of false. If we just initialize lastAddedWasNonTag with true, we can simply drop the variable i altogether! To simplify a bit more, we'll invert the meaning of the boolean to a positive spelling (without the "non"). Here's the result:
  private void addChildHtml(StringBuffer buffer, int depth) {
    boolean lastAddedWasTag = false;
    for (HtmlElement element : childTags) {
      boolean thisChildIsTag = element instanceof HtmlTag;
      if (thisChildIsTag  && !lastAddedWasTag && !isInline) {
          buffer.append(endl);
      }
      lastAddedWasTag = thisChildIsTag;
      buffer.append(element.html(depth + 1));
    }
  }
Removing all the clutter now actually makes clear what the method does: concatenate all the element's html in the buffer and optionally insert endl in between. And you see the exact condition for when the endl is inserted. Don't you think this comes much closer to clean code?! For one thing, it has a lot less indirection and a lot less relying on names. After Uncle Bob's refactoring in the video the same code is spread into several different methods and some new member variables. Imagine you have to make a little modification to that: where would you even start? How many different places would you have to edit? Finally, to not turn this into a flame war, let me say that while the example done in the video is really bad, I generally think Uncle Bob is a good teacher of best practices overall. I have read his book and found many defects, but also a lot of good explanations of good stuff. In particular, the book also explains command-query-separation (on page 45) and Uncle Bob does it really well on other examples. So please, keep watching his videos and reading his books, but don't take everything as a final world. There's always more to if you study and apply the first principles of fine programming.

3 comments:

Fred Blasdel said...

Did you see the bit about Ron Jefferies trying to write a Sudoku solver with TDD and refactoring?

These guys are charlatans.

Thomas Eyde said...

@Fred, I haven't seen anyone else writing a Sudoku solver with TDD and refactoring. So what's your point? That none of the things these guys promote are for real?

Robert Jack Wild said...

I have searched the interwebs long and wide and not found a single conding example where TDD improved the design or test coverage of a solution when compared to conventional practice.

In coding dojos done at our company I found that teams using TDD usually were slower and had less quality (cleanliness, good design) in their resulting code.

In short, all the evidence suggests that TDD as described by the three principles does not work in practice. It sounds very attractive but it doesn't solve any problem of design or test coverage that other methods don't solve better.

People who are successful with TDD usually rely on those other methods, but can't say it because "conventional wisdom" just doesn't have name to call it by.

Post a Comment