The Rise Of Version Control Systems

The history of VCS tells us interesting things about usability and why a piece of software is successful. I saw a few generations of version control systems so far, each one improving on its predecessor in major ways, with sometimes surprising consequences.

Disclaimer: This piece has no intention of being comprehensive or particularly accurate, though I researched it a bit. It’s a bit of history seen from my personal experience.

A picture of a 3D-printed version of Github's Octocat, in front of a computer screen

The first version control system I (briefly) used was CVS. It was created in the 1980’s, as a frontend to an even older - and clunkier - system called RCS. These systems were client-server based: a central repository was necessary and committing a change required network communication. They were clearly designed for enterprise use-cases where the server would be set up by the IT department, then used by engineers. The typical usage loop was: checkout, make a change, record the change to the server. Using CVS was somewhat difficult, but it provided a lot of value: centralized backup, code sharing between people, change tracking.

Then came Subversion in the early 2000s and it was a significant path forward. That was in my opinion for a single reason: change tracking was done for an entire directory structure. RCS and CVS were attaching history to files, based on their name. CVS had some support for directory-level commit, but that was not much more than a loop over all files that had changed. That meant two things: file renames couldn’t be tracked ; and when you were pushing a directory-level change and one of the files had a conflict you’d end up with a partially-committed change - changes to some files were recorded and not others. Subversion was much easier to use because the mental model was much simpler: what you were committing was a particular state of the entire directory. If there was a conflict anywhere, nothing was pushed to the server. The usage loop is the same as CVS, but with stronger guarantees. As a result, Subversion quickly overtook CVS and became the default version control system, at least for free software projects.

The next step was distributed version control system (DVCS), which I started using around 2008 with Bazaar (released in 2005, like Git and Mercurial). Suddenly the central server was not necessary anymore and it became trivial to start using version control: bzr init. That was possible because hard disks were large enough that it was possible for every engineer to have all the history stored locally, and more importantly that we had tools to merge concurrent changes effectively. This latter point cannot be overstated: RCS avoided merge conflict entirely by requiring engineers to lock a file on the central server before editing it (!). CVS and Subversion didn’t have that restriction, but DVCS went one step further: you could not only change files concurrently but also commit them concurrently. It had a massive consequence: version control system became a programming tool, like the compiler or the code editor.

The next major innovation came from Github, which made it trivial to host, share and combine code. That was all thanks to DVCS and what they enabled: the pull request. DVCS can be used with a centralized server like with Subversion. But it comes with the downside that someone has to decide who has write permission on it. The pull request method is much more flexible: the repository’s owner can decide on a case-by-case basis whether a change can be written - or they give full write access if they trust a particular person. It also made code review a natural part of the process: in its most basic form code review can be performed using the same tools someone is using to merge their own local branches. Web-based platforms only simplified the communication by making it possible to add comments on top of the diff.

Now looking at the history of version control systems, it’s clear that Git and Mercurial did not invent the idea of DVCS. GNU Arch (2001) and BitKeeper (2000) had been around for a while. What made Git and Mercurial so successful? The questionable licensing choice that BitKeeper made in 2005, and the popularity of Linus Torvalds certainly played a role, but I suspect there are technical reasons as well. Let’s speculate. GNU Arch’s version of a ‘repository’ was stored outside the source code directory tree, which feels like having an external server: when you checkout a project, the code is written in one directory, and the history in another. It looks like it’d be easy to delete one and not the other, likely leading to problems. Also GNU Arch made it possible to split the history between different repositories, which sounds like exposing too much detail. BitKeeper wrote the checked-out code read-only, and required you ‘unlock’ the files before editing them. This looks like the anti-pattern I talked about in A tooling tidbit: forcing the user to think ahead of time. When you’re working on a feature, your mind is on the code, not the version control system. Forcing them to unlock a file imposes an unnecessary context switch. Git and Mercurial do that well: the user can edit files the way they want, then interact with the VCS once they need it.

A nice read if you want to learn a bit more about the history of VCS: “A Survey of Version Control Systems” A. Koc, A. U. Tansel. 2011.

Photo by Roman Synkevych on Unsplash

LinkedIn Post - if you have comments