Subversion Re-education

If you’ve been using Subversion, you'll be happy to learn that Bazaar can work as an almost drop-in replacement. However, if you want to get the most out of Bazaar, you'll have to learn some new concepts that may seem confusing at first. If you’ve never used Subversion, just skip ahead and learn Bazaar from the ground up.

You’ve probably been using Subversion for a long time, or maybe a similar centralized version control system like CVS. If you start working with Bazaar and expect it to work the same way as Subversion, you’re going to be confused. Before you can wrap your mind around Bazaar, and distributed version control in general, you have to understand what’s different about these two approaches to managing your work.

Before I get into how Bazaar works, let’s take a moment to review a typical Subversion workflow:

  • Check out / update code from the central repository, which is usually stored on a remote server.
  • Make a bunch of changes to the code.
  • Commit your changes back into the central repository.

It's not a very complicated scenario, but it comes with a lot of headaches. The first problem is that if your central repository is on some remote server and you're offline, you can't commit your changes.

Even if you are online, you still may not want to commit any changes. In Subversion's model, every time you commit a change it gets pushed back to the central repository. That means any time anyone updates from the central repository, they will immediately see these changes. If there are no bugs, then it's not a problem. If there are bugs, all of your teammates will have to deal with the errors you've introduced.

You could choose not to commit any changes at all until all of your work is completely debugged, but then you could be working for hours or even weeks before you make a single check-in. What good is version control if you can't commit your work as often as you'd like out of fear of breaking the build or other people's work?

Of course you could always branch your code and work in your own sandbox. The problem with this approach is that Subversion only tracks enough information to remember a simple snapshot of your files, so when it comes time to merge your branch back into the one you branched from, you end up with a bunch of merge conflicts, many of which aren't really conflicts at all. Merging in Subversion is so difficult that most users either have extremely short lived branches or never attempt branching.

To understand why Subversion has these limitations, let's review how things work with Subversion:

In Subversion, the repository, which contains all of the history for the project, is on the remote server. Each developer has a working tree on their computer that holds their current work. If anyone commits their work, it is sent directly to the repository on the server.

Key Difference #1
Every developer works in their own branch stored in a local copy of the repository.

Unlike common Subversion workflows, all work in Bazaar is done in the context of a branch. The "trunk" of the project is just a branch. Each developer has their own branch forked from the trunk (which is stored in a local repository with the complete history).

A developer pulls (bzr pull) changes into their branch from the remote one. If their branch has diverged a great deal, they may need to merge (bzr merge) revisions from the remote trunk to combine thier work. They can commit (bzr commit) changes from their working tree to their branch as often as they like without impacting anyone else. When a developer is ready to share their code, they can push changes (bzr push) from their branch to the remote branch. Until they push, all of their changes are local and not seen by the rest of the team.

Key Difference #2
Commits to a branch stored in a local repository are not immediately shared.

In this model, committing code is divorced from sharing code.

Additionally, everyone has a copy of the project history. The only thing that is "central" about the remote branch is that everyone agrees that it is. You could take any developer's branch and decide it's the authoritative branch. The remote repository could be lost to a server failure, and the project history would still exist on each developer's computer.

Other differences between Subversion and Bazaar

In Subversion, branching and merging is difficult. This is because a revision in Subversion is a simple snapshot of the state of the file system when it was created.

Bazaar also has revisions, but revisions in Bazaar capture a lot more information, including metadata describing the revision, parent revisions it's derived from, and a globally unique identifier.

Key Difference #3
Revisions in Bazaar store more information to make merging easy.

Because of these properties, it's very easy to branch in Bazaar, and to merge branches back together. Bazaar knows how to recognize the same revision across multiple branches, and each revision remembers its own lineage. When it comes time to merge, Bazaar can easily see at which point things diverged and the best way to weave them back together.

In the diagram above, Fred and Olivia's branches are divergent. Fred has made a number of changes to existing files and added a new one. Olivia has also made a bunch of changes, and removed a file.

If these were branches in Subversion and we attempted to merge them, Subversion would look at the latest snapshot of both branches and try to reconcile them. Anyone who has experience merging Subversion branches knows that very divergent branches lead to complicated and painful merges.

If we made the same merge with Bazaar, it would know exactly the series of changes that each developer took to get to their current state. It would also be able to figure out where and how the developers diverged so it wouldn't waste time trying to reconcile parts of the histories that were the same.

Key Difference #4
Revisions can be shared directly, not just via a central repository.

Because Bazaar knows the parents of revisions and can identify the same revisions across any branch, branching and merging becomes easy and is the default way of working. Not only can a developer share changes via a "central" repository as you would with Subversion, developers can share revisions directly between their branches without passing revisions through a central location.

Key Difference #5
You can have as many branches as you want, arranged any way you prefer.

This kind of flexibility lets your team have as many branches and repositories as they want— not just a central one. Every developer can have their own branch. A whole team can decide to have a separate team branch and fork from there. Individual developers can have branches per feature. You can use a branch just as a sandbox to try out ideas. If the code works, you can push it to the remote branch, and if not, you can discard it without affecting anyone else.

Bazaar lets you organize your work however you like, and you no longer have to fear Subversion style merges.

Key Difference #6
Bazaar commands operate on the entire working tree, not just a part of the tree.

Another key difference between Subversion and Bazaar is that Subversion commands tend to work within a single directory whereas Bazaar commands work on the entire project tree. For example, if you were to commit changes in a subdirectory in your project, Subversion would only commit changes in the current directory and subdirectories. In Bazaar, your commit would record pending changes anywhere in the tree.

This is not a big difference, but it might have some impact on how you structure repositories. If you have a single Subversion repository where each subdirectory represents a separate project, you may find that having one repository per project is a better way to go in Bazaar.

Using Bazaar as a Drop-In Replacement for Subversion

The Bazaar tutorial hasn't even started yet, and you may already feel a little overwhelmed by how different Bazaar is from Subversion. Moving to a distributed version control system brings a lot of benefits including working offline, easy branching and merging, and the freedom to structure repositories as you'd like. That said, Subversion is a simpler model. You don't have to worry about pushes, pulls, merges, and branches. The Subversion model is basically checkout, update, commit.

The sooner you let go of the Subversion way of doing things, the easier it will be to learn Bazaar. That said, Bazaar doesn't force you to abandon your old workflow if you don't want to. In fact, you can have a few members of a team working with Bazaar in a completely distributed fashion while others stick with the Subversion way of doing things. Some nontechnical members of the team, like designers who know enough to commit updated images to the central repository, don't have to change the way they work at all.

For example, if you wanted to "checkout" code from a remote branch like you would from a Subversion repository, you'd use this command:

Creates a branch that is tightly bound to another branch, known as a "checkout". This also creates a repository to store the branch if one does not already exist.
~ fmccann$ bzr checkout bzr+ssh://
~ fmccann$ ls
~ fmccann$ cd trunk
trunk fmccann$ ls

That's right— you just use bzr checkout! This creates a checkout of the code that behaves much like a Subversion checkout. Let's say some other developer on the team committed a change to time.rb and you wanted to get up-to-date with the "central" remote branch. In that case, you'd do this:

Updates the working tree to a new revision. In the case of a checkout, this command consults the bound branch for the latest revision.
trunk fmccann$ bzr update
 M  time.rb
All changes applied successfully.
Updated to revision 3 of branch bzr+ssh://

You use bzr update. Starting to see a pattern? Let's say you made a bunch of changes to time.rb and you want to commit them back to the remote branch. You'd do this:

Records the current state of the tracked files and directories as a new revision in the branch.
trunk fmccann$ bzr commit -m"added timezone support to time.rb"
Committing to: bzr+ssh://
modified time.rb
Committed revision 4.

That's right, bzr commit. At this point you may be wondering why I spent so much time telling you how Bazaar doesn't work like Subversion when it looks pretty darned similar. How is this distributed? When you commit the changes, it's going right back to the central repository on the server!

Bazaar is a completely distributed version control system, but the people who created Bazaar realized that different people work different ways, and for some teams, a completely centralized approach is fine. As I said, it could be the case that only certain team members need a simple, lock-step centralized approach, while others may not. Bazaar will let you do that.

While it looks like Bazaar works the same way as Subversion in this instance, there is a difference. When we bzr checkout the trunk branch from the central repository, we're still creating a local branch with all of the history copied to the client. However, a checkout is tightly bound to the parent branch on the remote server. That means when you bzr commit changes, Bazaar with make sure that the local branch is up-to-date with the remote branch, and it will commit changes to the remote branch before it commits the same changes to the local branch.

Bazaar is a distributed version control system, but it can mimic the behavior of centralized version control. Because Bazaar can work like Subversion, it allows you or a few team members to dip your toe into the waters of distributed version control before diving in head first. You can have a simple centralized model and still get the benefits of working offline (see bzr commit's --local option) and easy branching and merging.

That's the last time I'll mention the Subversion ways of doing things. If you are interested in sticking with a centralized approach, refer to this tutorial. In the rest of this tutorial, I'll make the case that you should jump in head first and learn the distributed way from the ground up!

Next: Bazaar from the Ground Up