A Binary Problem

So, as evidenced by some earlier posts, I'm completely enamored with distributed version control, especially Mercurial. This past week, I actually was able to use Mercurial the way it's supposed to be used. We had to do a quick fix for a client while on-site, and now that we're back, I've pulled the fix onto my local box, where I can actually write the unit and integration tests to make sure we actually have the problem fixed.

However, this trip has also confirmed something that I kind of knew was a problem with Mercurial and distributed version control in general: it doesn't handle assets well at all. Mercurial even gives you an error when you attempt to add files larger than 10MB.

"Files over 10MB may cause memory and performance problems"

Though this isn't really a problem for most projects (though there are libs that can certainly get up to 10MB in size), it's a serious problem for games, where version controlling assets is a significant issue. In addition, because of the way Mercurial handles files there are limits on file size. If anything, this could be a deal breaker that would (and should honestly) prevent distributed version control from being adopted widely in the game development community.

But is it a solvable problem?

In some ways, yes it is. The simple solution is to use another piece of version control software for your assets. This is actually more common than you might think, as certain pieces of version control software offer things like asset diffs, something a generalized piece of version control software usually can't do. That said, having your team (at least your programmers and producers) learn two different pieces of version control software can be a pain, and configuring them to both play nice can be even more of a pain. In addition, if you're doing something similar to the Kernel Practice (having subsystem maintainers, leads, or "lieutenants" in charge of reviewing systems before pushing them to an "authoritative" repository) or something similar to Controlled practice (which is a hierarchical development model), having a second piece of version control software that doesn't support these does you absolutely no good.

So the best solution is to use Mercurial for your assets still, but how?

Unfortunately, there's still no good answer for this. Certain extensions (like Forest, Trimming History, Shallow Clone, and Overlay Repository) look promising for avoiding performance problems on a developer box, but not for avoiding the problem on the server. However, those in combination with Rebasing, could allow you to keep only portions of history in a given repository, but this would have to occur behind the scenes as much as possible.

The problem is that Mercurial is designed for lots of small files, so the generalized extensions are just specializations for lots of small files. So, the solution, in really, is to create an extension for Mercurial specifically designed for assets and large files, that understands that full histories are not always an option (or a necessity), that cloning full asset histories are expensive, and that a standard diff algorithm can be hugely expensive, and thus should be done through incremental diffing. And its handling of all of this should be largely transparent to users, especially since, let's face it, most artists can't be forced to deal with this stuff.

Will I make this extension? Unfortunately, no, not any time soon. As a programmer at a middleware company, we don't have many assets to version, so Mercurial works fine for our needs. That said, if I ever get back into game programming, you know step one will be to make sure I can use Mercurial for assets.

Mercurial Second Impressions

I have a bit more to say on distributed version control, some theories on how it might be used in large agile teams, but I don't necessarily want to touch on that today. I've decided I really need to read the rest of the Mercurial book (I read most of it!), as there's things in there that help explain how the extension mechanism and hook system works, and how you can use them to promote pre-push process. Right now, though, I quickly want to respond to a few of my initial impressions of Mercurial:

  1. Local branches and rebasing are supported in Mercurial through extensions. The transplant extension, Rebase extension, and Local Branches extensions are available and allow you to do everything I mentioned. This is the benefit of using a piece of source control that has a plugin architecture and is easy to extend. You get extensions for everything. That said, I'm convinced I wouldn't use any of them (save potentially rebasing) now that I've worked with mercurial more.
  2. There's an extension for Mercurial that provides an ascii graphical log of branches / merges. It's pretty awesome. I'm sure there are other GUIs available as well, and hey… maybe I'll make a WPF one in the future?
  3. The internal branching system works fairly well, but after using it I've decided they're almost not worth using. Instead, I'm now using separate repositories, even for long lived branches. This means not necessarily knowing which branch a change comes from, but I'm okay with that.

Also, in the comments, Programmer Joe commented that he'd been thinking about distributed version control, but wants to use me as a guinea pig. Unfortunately, I'm in a very small team of people, so I don't think I'd be that usefully as a guinea pig for large teams. I certainly have theories on how a large team structure should work (which I will post about), but nowhere to try them out. I'd love to do some large team work with Mercurial to see how it'd work out. Anyone know a large team looking to be adventurous? 😉

On Distributed Version Control

So I had a conversation last night with my good friend Steve about my decision to start using Mercurial. Talking to him, I realized I hadn't really posted much on what I think about distributed version control systems (DVCS), so the switch to Mercurial may have taken many people off guard. So, I wanted to spend a post (maybe two) talking about what I see as the advantages of DVCS over a traditional "central server" (VCS) mentality.

The Leap of Faith

Like many, about the idea of version control without a central server scares me. I don't trust my own machine, or my developer's machines, to be in any way fault tolerant. My server, on the other hand, has a RAID 5 controller in it, and is backed up off-site weekly (though if I were checking in more I'd probably back up nightly). Having that central server keep track of my changes feels safer, so I shied away from DVCS, opting instead for centralized version control with Subversion with user branches (more on that here). Even with tools like svnmerge though, Subversion utterly fails at merge tracking, especially from multiple branches bidirectionally. It was just never scaled to do that. Since then, I've watched several videos on the design of distributed version control (two on Git (here and here), one on Mercurial) and I made the leap of faith to attempt to see what DVCS is like.

The Centralized Model, Distributed

Now, distributed version control is amazing for open source projects, but what about working in a company where team communication and process is key? Where coordination is a requirement? Well, the nice thing about distributed version control is that it can, if you need it to, support a centralized model. Even in open source projects, you have what you call an "upstream" server, which is what everyone consults for the latest and greatest "official" changes to a product. You could, if you wanted to, push every commit you made to the upstream server, and pull every so often to make sure you have the latest version. In that case it would be exactly like using centralized version control (except it takes a more hard drive space, as you're holding a repository on your hard drive, not just the source code). Sure, you have to train people to commit, then push, but once they understand that, there's no issue. In addition, your centralized version of your tree can have pre/post-commit hooks just like any other version control system, allowing you to create check-in gauntlets should the need arise. That said, I'm sure other source control providers make it easier (the new Source Safe and Perforce I believe have GUIs for create check-in gauntlets, but I could be wrong). Still, with an easily extended system (like Mercurial), these additions wouldn't be hard.

Encouraging Experimentation and Checkpoints

So in this case, why use a DVCS if it's just going to work like centralized version control. Well, even in this situation, you now have more options available to you than you did before. First is the ability to checkpoint whenever you want, without affecting other developers. The key reason most people give for using version control is the Chinese proverb "The palest ink is better than the best memory." So what not increase your memory by encouraging your developers to commit as often as possible? And if they can do this even when their build is broken or when they're not completely finished with something, why can't we allow them to do this? In a centralized model, this is almost impossible (though packing apparently solves this partially). Then, when a developer is done, their commit can have all of the changes they just made, with a history of what they did. Of course, it doesn't have to (rebasing is always an option) but having it there can help you see potential problems or flaws in thought process.

In addition, distributed version control can encourage experimentation in branches. The problem I have had with branches in the past is that they are a pain in the ass to manage on a centralized server. Even Subversion, which makes branching easy, doesn't manage merging well at all, especially bi-directional merging. I've even had problems with Perforce merging in the past. Regardless of that, though, working off of branches and creating branches is never easy. Not as easy as it is in distributed version control environments anyway. In Mercurial, branches are just clones of the current repository. You can work in them, share them with others, and delete them without the central server being involved. This may sound bad, but it does encourage developer experimentation and sharing. If I can branch simply off of whatever my current state is, experiment a bit, show that change to other developers and get feedback, then merge back into my main development branch easily, that opens up a lot of opportunities for small team experimental work.

Here's an example of where I could have used this in the past. One of my former companies did not do and did not encourage unit testing, and I could not convince my coworkers (or the management) to let me try it in some of our newer libraries. I decided, though, to do unit testing on any new modules I wrote anyway. I made a copy of the library into a directory that had a unit test environment set up, worked on the library (writing new tests, making them pass, etc), then copied my changes back over when I was done, and committed them to the central server. The unit test folder, though, was completely unversioned. If I'd had a DVCS, I could have cloned the library, done my work, then pulled changes back, leaving the unit tests versioned in their own folder. In addition, any other developer that was also working on those libraries could have easily pulled the unit tests from me, thus spreading a new practice at the company. This may be an edge case, but it is an example where simple branching and merging would have encouraged experimentation between developers.

Shelving, Packing, Branching

This gets into the second option you have with distributed version control. DVCS makes it easy for developers to share uncommitted changes with each other. Other source control providers give you the option through "shelving" or "packing", but I can't see it being as easy as it is with distributed version control. I can pull changes from anyone, on any branch, into a cloned version of my own repository to test things out without talking to a central server. Those changes don't need to be based off of the same trunk and can easily be merged by whichever developer at whatever point, then pushed to the upstream server, all while retaining who made the original changes. This has to be experimented with to really see the full benefit, and honestly I haven't done it enough, but I can see where it would be useful. Unlimitted simple branching with the ability to push and pull changes from any developer coupled with a strong revision history just sounds nice to me.

Fault Tolerance

The last thing I want to talk about is when things go wrong. We try to avoid it, but every so often, our central server goes down. Sure, we have backups off site, maybe we have a passive failover server, but if we don't, our central server going down is very problematic. I have had this happened to me, and at fairly large companies that can afford lots of nice servers. If this happens in a traditional VCS, development (basically) stops for the day. Not so much in distributed environments. Since I can commit to my local repository all I want and share with everyone else without the benefit of a central server, a dead upstream doesn't affect me at all. This, in my mind, is pretty awesome.

More…

I will say that I'm using a DVCS in a very small team environment, but looking at it, I believe it could scale very easily. I also think there are lots of interesting ways to use DVCS, especially in agile environments where small teams may need to communicate changes without affecting central efforts. I really want to go into these, but this is already too long, so maybe I'll talk about them tomorrow. Until then, let me know what you think.

Mercurial: Initial Impressions

So here at Orbus, I've decided to move some things over to Mercurial for a bit to try it out. Generally, the whole user branch thing I talked about way back when we started wasn't working. Setting up each user branch was a huge pain, and merging large changes back and forth was almost impossible, even with a helper tool like svnmerge. Svn supposedly fixed this with 1.5, but if they did it the same way they did it in svnmerge, I'm not going to bother attempting to use it. Bidirectional merges from multiple branches just doesn't work with svnmerge, and I don't recommend attempting it. Mercurial, I'm hoping, will at least partially solve this problem.

So, we've been converting things over the past few days, and I have some initial impressions. First, the bad:

  1. Repository cloning on large repositories, even with the addons that allow windows to use hard links, is fairly slow. I wish mercurial allowed lightweight local branches that could be developed on, and then merged back into a working copy, all in the same directory. This is the git model of doing things. I understand why mercurial doesn't support it, I just kinda wish it did.
  2. I'm sad there's no rebasing. I would like the ability to do work in a branch, commit, say, 20 changes in that branch, and then rebase it all to one change in the main trunk. Again, I understand why it's not there, but it'd be nice to have.
  3. I'd like to be able to see who pushed what revisions to an upstream server. Having who made the changes is great, but I'd also like to see who pushed them. I'm not sure this data is available to mercurial though.
  4. All the interfaces need work. I would love graphical tools that could show branching and merging into various files. Not sure what other distributed software does for this, but it seems like a very hard problem.
  5. The internal branching system is extremely confusing, and I'm not sure merging single changes between branches will be any easier with Mercurial or not. Will have to see.

The good:

  1. Mercurial was really easy to set up, even integrating LDAP into the push permissions system. I like it when things are that easy.
  2. It makes collaboration with other programmers easier, just because of the way it works.

In general, I'm currently just using Mercurial like I'm using Subversion. The tiny bonus is now I can grab changes from other programmers, take a look at them and give comments, all without posting to a server, and that's really nice. Once I get some more experience, I'll make sure to hand out more impressions.