A Case for a Descriptive Git History
Git is a really, really fascinating toolset. There's so much to it (see any article on octopus merges in the Linux kernel, or look up the filter-repo tool), and yet it's so simple. There's a change; you get a message. What could be simpler?
It's such a useful way to save your progress that during development our commits often come out looking like this:
fixed the button
where are the strings?
database was messed up
don't know why this has to happen but whatever
or, more often:
changed another thing
Some teams have policies where each self-contained chunk of work is a commit (maybe it's one commit per scrum story, etc.). Or, maybe it's as few commits as are sensible.
I advocate for this policy on teams I work on; but so many times I hear developers saying, "Why do we need to care, Emma? Who cares what the commits look like? It's just a way to save our changes, anyway."
Come with me on a journey
Picture yourself debugging some code you've never seen — or maybe you wrote it, and you've completely forgotten how it works. You stumble upon a line that looks out of place. "That doesn't look like it belongs here," you say. "Maybe I'll just remove it and do what it's trying to do, over in that other file where this sort of line of code should live." Or, maybe it seems like a line that's going to be a noöp, and you think to yourself, maybe I'll just remove this. You look at the history for the change in your favorite Git IDE and the commit says another change. Well, that's not much to go on.
Flash back two years. There's a high priority crash that's affecting a million people in production, and another developer (maybe it was you) is feverishly adding changes locally, getting closer and closer to the problem. They make a variety of changes that help the bug, and finally stumble upon the file that's causing the biggest problem. Relieved, they add a commit for a change in that file that says fixed the bug, get it through QA, and release it to the world. Problem solved!
Back in the present, you're looking at their change spread out over a series of commits. You have no way of knowing whether all of these commits have to do with the same bug, because they all have commit messages like another fix, stopped it happening in the view model, and another change. Clueless, and seeing no reason why the line of code in the "another change" commit needs to be there at all, you remove it as part of a small set of logic for a feature you're working on. Two weeks later, your code is in production, and a member of your team files a bug. Looks like a small resurgence of a two-year-old crash. The release contains hundreds of lines of changes, and none of them look very consequential — little fixes here and there, some features. What happened?
Descriptive git histories are about business continuity and audit trails; if you can't understand why a change was made, or you can't contextualize the change a year or two later, you've lost all institutional knowledge around that piece of logic in your company's product. Descriptive git histories are about saving you time and energy hunting down regressions; without context for a change, it's very easy to reintroduce bugs without knowing why. And perhaps most importantly, descriptive git histories are about a codebase that explains itself in plain terms. A codebase that stands up unapologetically and says this is me, world. I'm a collection of features and bug fixes, and I have clear explanations for every single one.
The other more boring reason for having self-contained git commits that contain all of the work and context for a feature or bug fix, is in case you have to revert that feature or bug fix later. And if your team uses merge commits heavily, selecting 15 commits that make up one feature and reverting them can turn very confusing very quickly if there are merge commits interlaced in the history with those 15 commits. (Not to mention — if those 15 commits all have three-word, nondescript messages, are you really sure they're the only commits you need to revert?) Having a single commit for a feature, or maybe two or three commits comprising a few self-contained components of that feature, lets you think less and get more done.
At the end of the day, work is hard, and coding is hard. The only person with all of the context for a change is the developer writing the change, while they're writing it. I know from personal experience — and I know I'm not alone in this — that I can write a large change in the afternoon, and in the morning be looking at my commit that says "almost done" and thinking ...where was I?
Building your descriptive commit message as you're working on the changes, rather than using nondescript commits in flight and writing a good message later, also lets you think less and get more done. Writing code is hard, and reducing your cognitive load should be a top-level goal, to improve your quality of life. And again, the less you have to wonder about what's happening with your and other people's code, the lazier you can be, while still being productive. "Wow," your coworkers will say. "You sure get a lot done and always seem to know what's going on with our team's code. And your commit messages are so descriptive and helpful!" And you will smile quietly and say, "Thanks! It's because I'm really lazy."
Epilogue: An example of a descriptive commit
A good rule of thumb is to include the reference number to the work item you're working on — if your team uses Jira, that would be the story/bug/task number, like AN-123. Another thing to keep in mind is that Git commit messages are free-form — that means they can be structured however you want! But, not so fast. Git IDEs, like GitHub or Tower, are opinionated. If your commit message doesn't follow the normal structure, most IDEs will display it with truncation, making it harder to read. Git etiquette says:
This is the commit title; it has fifty characters
After two newlines, you can write your novel. This is what's referred to as the commit description — after the title (which is expected to be 50 characters or fewer), there are expected to be two newlines, then your commit description, which can be as long as you want.
How I like to structure my commit messages is something like this:
AN-123 Resolves the database inconsistency
- It turns out that we were accessing the database without any locking mechanism. This was causing a read/write race condition that ended up with inconsistent behavior for the client. Resolved this by using the built-in merge resolution functionality provided by the OS.
- We were using references to concrete implementations of the database throughout the stack, so there was no way to validate changes. Resolved this by adding an abstract interface that's now vended to callers.
- Adds unit testing to all of the major classes that were using the database, using a mocked database behind the new abstract interface.