So, as evidenced by some earlier posts, I'm completely enamored with distributed version control, especially Mercurial. This past week, I actually was able to use Mercurial the way it's supposed to be used. We had to do a quick fix for a client while on-site, and now that we're back, I've pulled the fix onto my local box, where I can actually write the unit and integration tests to make sure we actually have the problem fixed.

However, this trip has also confirmed something that I kind of knew was a problem with Mercurial and distributed version control in general: it doesn't handle assets well at all. Mercurial even gives you an error when you attempt to add files larger than 10MB.

"Files over 10MB may cause memory and performance problems"

Though this isn't really a problem for most projects (though there are libs that can certainly get up to 10MB in size), it's a serious problem for games, where version controlling assets is a significant issue. In addition, because of the way Mercurial handles files there are limits on file size. If anything, this could be a deal breaker that would (and should honestly) prevent distributed version control from being adopted widely in the game development community.

But is it a solvable problem?

In some ways, yes it is. The simple solution is to use another piece of version control software for your assets. This is actually more common than you might think, as certain pieces of version control software offer things like asset diffs, something a generalized piece of version control software usually can't do. That said, having your team (at least your programmers and producers) learn two different pieces of version control software can be a pain, and configuring them to both play nice can be even more of a pain. In addition, if you're doing something similar to the Kernel Practice (having subsystem maintainers, leads, or "lieutenants" in charge of reviewing systems before pushing them to an "authoritative" repository) or something similar to Controlled practice (which is a hierarchical development model), having a second piece of version control software that doesn't support these does you absolutely no good.

So the best solution is to use Mercurial for your assets still, but how?

Unfortunately, there's still no good answer for this. Certain extensions (like Forest, Trimming History, Shallow Clone, and Overlay Repository) look promising for avoiding performance problems on a developer box, but not for avoiding the problem on the server. However, those in combination with Rebasing, could allow you to keep only portions of history in a given repository, but this would have to occur behind the scenes as much as possible.

The problem is that Mercurial is designed for lots of small files, so the generalized extensions are just specializations for lots of small files. So, the solution, in really, is to create an extension for Mercurial specifically designed for assets and large files, that understands that full histories are not always an option (or a necessity), that cloning full asset histories are expensive, and that a standard diff algorithm can be hugely expensive, and thus should be done through incremental diffing. And its handling of all of this should be largely transparent to users, especially since, let's face it, most artists can't be forced to deal with this stuff.

Will I make this extension? Unfortunately, no, not any time soon. As a programmer at a middleware company, we don't have many assets to version, so Mercurial works fine for our needs. That said, if I ever get back into game programming, you know step one will be to make sure I can use Mercurial for assets.

Trackback

2 comments

  1. Jakub Narębski @ 2008-10-24 12:13

    What about using Git (http://git.or.cz, http://git-scm.com) instead of Mercurial? I think it doesn’t have problems with large files, at least not with 10MB files.[1]

    The problem with Git might be working on MS Windows; you can use Git via Cygwin, or use native version from msysGit. Equivalent of TortoiseHG, named Git-Cheetah is in very early stages of development. Additionally Git is rumored to be more difficult than Mercurial; IMHO it is caused by the fact that Git is more powerfull (c.f. multiple branches and tags).

    [1] Dana How worked on better performance with large files (and not all patches got accpeted), but I think “large” was an order of magnitude larger…

  2. I looked at git originally, and you hit my major problem with it (it’s lack of windows support). I also feel like its been hacked together. Mercurial at least provides a very extensible interface through Python. That said I’ve never officially ruled out git as a possibility.

    The 10MB “limit” is just a warning though. It handles larger than that just fine, just warns it might be a problem. With ALL DCVS though, large assets is an issue. There’s no reason my artists should keep the entire repository on their drive, it just doesn’t make sense. Maybe change hashes, but that’s about it.

Add your comment now