Refactoring in a GIT world
2011-04-15 9 Comments
Mike McQuaid recently said in a mail regarding good style in GIT
Treat committing as being slightly more granular than saving a file. Do it very, very often.
This is a very succinct way of expressing something that you feel happening to yourself when you migrate from a CVCS to a DVCS: Commits become smaller. And so they should!
Take Refactoring as a specific case-in-point. We all say we’re refactoring all the time, but if we’re honest, we’re not really doing all the compile-and-test cycles that the book mandates after every minor step. Nor should we: they’re a waste of time, and a band-aid for the lack of proper tool support.
Automated tests, where available, are not usually designed for speed of execution. So it’s not uncommon for tests to run an unacceptably long time. And for a unit test, everything that exceeds one second is something that I consider too slow. Too long, to run each time you move a single line of code around.
Many will argue for a refactoring browser as a proper tool for the job. It can certainly help avoiding dropping-the-brick mistakes. But a refactoring browser cannot deal with more complicated refactorings, or semantic failures that a refactoring introduces. We still need the tests. We still need to limit the amount of changes in between tests lest we waste even more time searching for the specific change that broke a test (assuming we have it).
In fact, since a refactoring browser does the whole multi-step process in one big stride, it’s actually harder to find where it introduces a regression. There are probably refactoring browsers that optionally record each step of a refactoring in order to catch regressions; use them.
The problem also extends to the time after publishing the refactoring. In real-world projects, well-meaning refactorings still introduce regressions because there’s no 100% test coverage in that area of the code, or some users of the code use it outside of its design parameters. So, regressions will often show up further down the line.
Now, git-bisect is a very good tool for finding the commit that introduced a regression, but it doesn’t help much if the commits are too large. But what is a good commit size?
The Refactoring book gives a very clear answer to that: just replace compile-and-test by compile-and-commit, or just ‘commit’. Once the whole refactoring is done, you let your extensive test suite run on the result. If you broke something, you use git-bisect to find the commit that broke it. Since each commit is a trivial change, you’ll never search for the bugger, git-bisect will find it for you.
Even better: when you don’t squash the series of commits that make up your refactoring when pushing upstream, any regressions that you may have introduced and that were not caught by your test suite will still be pinpointed by git-bisect, in a trivial manner.
Here is a summary for doing a refactoring in GIT:
- Indicate that you’re doing a refactoring, and which one, by using a common prefix for all summary lines of the commits involved. Example: “FooBar: Extract Method doCalculate(), pt.N: use new doCalulate() in doStuff()”
- Commit at least as often as the book says you should compile-and-test.
- Run the tests only once, at the end
- If something breaks, use git-bisect to find the bad commit, and rewrite history.
- Don’t squash the commits when pushing. If colleagues complain about your tiny commits, complain about their large commits, and point them to this post 🙂
Is it worth it? Is the extra time you spend writing commit messages offset by the time you save when something goes wrong? Immediately, or later on? I don’t know yet. But if the experience written down in Fowler’s book is to be trusted, there’s a high probability it should.