So my challenge this week (well, one of them) is to figure out a source control model that supports Continuous Deployment. Continuous Deployment is, of course, the operational model whereby changes to code get deployed automatically (or nearly automatically, for the risk-averse) several/dozens/hundreds of times a day to production environments, in response to a change in the source code or resources or configuration values or persistence structures.
The simplest workflow is:
- Developer commits a change to source
- A continuous integration server picks up the change from the VCS, and runs one or more builds
- The builds pass all the tests
- The CI server or some other process picks up the build artifacts and pushes them to production.
- Repeat!
It gets a little tricky when trying to design the proper VCS workflow. In days past, I’ve worked in a slower environment that was very attentive to VCS workflows, and our process revolved around the so-called stable trunk paradigm: features were developed in branches, and then, during a period of several hours, some manual steps were run to merge tested changes back into the trunk, which was then branched to a “release branch”, from which the actual deployment artifacts were produced. During subsequent feature development, if any production bug were discovered, we’d fix it in the release branch, and merge changes back to trunk and then to all open feature branches.
It worked well, but when faced with the goal of Continuous Deployment, stable trunk seems to be untenable. Automatically managing 3 open release branches per day, 2 of which would be rendered obsolete, seems like a pain in the ass.
So, unstable trunk? Trunk always contains releasable code? That makes my gut churn a bit:
- Release A
- Developer “Hacky” commits changes A’, which fails some tests
- While A’ is building and testing, a bug is discovered in A, which requires a fix A’’ to trunk.
- We can’t release A’’ until Developer “Hacky” fixes her problems in A’.
Well, we could, but the mechanism would be back to feature branches, where each feature is developed in a branch, and when done/tested/passed QA, merged back into the trunk.
I still don’t know how to avoid that period of a few minutes where the merge has occurred, but the tests have not yet been confirmed as passing. If we need to fix a bug in production RIGHT NOW that seems like a weak spot, unless we introduce something like:
- branch trunk T to T’
- merge feature branch into T’
- run tests on T’
- merge T’ back to trunk T
but, if someone has committed to T in the meantime, what do you do? Run the whole process over again, I guess. I gets to a point where it’s turtles all the way down.
One thing CD seems to eliminate from discussion is the promotion model, where a human is responsible for deciding when code gets promoted from staging to production. Or, if you have that model, it’s automatic and implicit in the deployment process.
It also means – I think – that you should have feature branches for bigger features, whose code you don’t necessarily want to push to production until the entire feature is complete. I know there’s a whole sub-literature on the topic of introducing “hidden features” into production, which are either partially complete or outright broken, but I don’t like that.
I found a couple articles on Continuous Deployment very worthwhile:
A short non-technical article about Etsy’s CD setup, written by Fred Wilson
The key takeaway from this was that we don’t roll back failures, we fix them. That’s interesting. The calm, dispassionate, conservative part of me quails slightly at the thought of not having the ability to rollback, but…maybe? Maybe?
Eric Ries’s “Continuous Deployment in 5 Easy Steps”
This one has an interesting dimension: stopping the line via a commit check script, e.g. if a build breaks, stop all new commits from happening. VCS-wide, no less. That seems a bit heavy-handed, but it probably results in a very disciplined approach from developers prior to committing anything. To the detriment of speed? Maybe. Hm.
Timothy Fitz on IMVU’s Continuous Deployment process
He talks a bit about a personal bugaboo of mine, the intermittently failing test. I HATE those, and I don’t use the word “hate” lightly. I have spent many man-months of my life figuring out how to write tests (and thus, code) that doesn’t mysteriously blow up in the presence of weird input or context, and I like that he (correctly) notes that as the CD process scales, you MUST address this issue or you will be a dead duck.