The Database Upgrade Problem

This is a problem I think everyone has at some point when working with databases that can’t lose data, and specifically those that generated their database from another intermediary. The problem is this: when someone changes the source file from which the database is generated, how do you upgrade said database without losing any data? Or, losing as little data as possible.

In my mind, there are two approaches to this. The first is to attempt to run a diff between the new file and the old file, and generate an alter script that can be run on the database. The alter script contains the information to add new properties and delete old properties as needed, and can be stored in source control to run in sequence if required. These generated alter scripts are also nice, because if there’s a problem with one of them (say, a property is renamed and the diff believes it was a drop + add) you can fix it manually and still leave the rest of the script alone.

The problem with this solution is that it requires the old version of the file as well as the new version, causing some interesting workflow problems. For example, let’s say I’m a programmer that wants to add a new property to the database. Here’s what I want to do:

  1. Open my XML
  2. Add the property
  3. Regenerate the code
  4. Run the new scripts on the database

In actuality, here’s what I have to do:

  1. Copy my old XML to a backup.
  2. Add my property to my new XML
  3. Run the generator with my old version and new version of my XML
  4. Run the new scripts on the database.

So, now, I have my old version and new version, I test what I’ve done and find that I added the property wrong or I need an extra property. What do I do? I can’t revert the database. I have to produce *another* copy of the XML, change it and run the upgrade system… this leaves me with lots of old backed up XMLs… it’s just not pretty. Then what happens when another programmer wants to change things? Provided you’re not checking in your alter scripts (and, sometimes, even if you are) he has to hold on to his old XML, update to the new one, run the generator, run the alter script and now he can get working. That’s best case. What if he’s working when you check in? It’s just not pretty.

The other method (and I’ve seen this done) is to consult the database for its current state, and diff it against what the XML is expecting, then change the database accordingly. In this case, you could also, of course, just generate an alter script, but you run into the issue of, again, having lots of alter scripts for a single check-in. In addition, sometimes manipulation of the database (as mentioned before) can have unforeseen consequences. If your diffing system interprets a rename as a drop and re-add, you lose that data, and if you’re going directly to the db instead of an alter script, that’s bad.

The only solution here I can think of, which isn’t great, is to also have a “revert” script generated with your alter script, then write a shell script for your day to day workflow that does the following:

  1. If it exists, run the revert script from your previous run of the upgrade generator (here, it would be nice to also be able to tell if this were actually necessary)
  2. Use your version control system to check the version of the XML you are developing against and grab the pristine copy of that file to a temporary location (most source control allows you to do this, if only for diffing purposes).
  3. Run the upgrade generator, which produces an alter and a revert script.
  4. Run the alter script on the database.

Then, you can change your XML again (or merge with something another programmer did) and what you’d just done to the database is reverted before applying the new change. Problems? You have to remember to delete the revert script when you check in, so that whatever you checked in doesn’t get undone. I’m sure there are ways to check to see if you actually want to run the revert script, but generally, I can’t think of a good one.

Anyone found other ways to solve this problem?

The Server Conundrum

The next big task at Orbus is to get our systems working with what I call “intermediary” servers: enterprise style message queues and things like that. Basically, they idea is that the game can sit and asynchronously pump metrics to the queue without sacrificing performance in the game, and the queue will catch up during times of low server load. These systems are also easy to run in parallel and in high availability modes, so if one messaging server does get overloaded (or dies), another is there to catch the slack. Right now, we're looking at JBoss, since it supports all of that in an LGPL license, which is nice.

Anyway, it’s time for me to install this server and start writing software against it. Of course, I’m planning on unit testing connections to it and making sure that everything actually works the way it’s supposed to. However, to do that, I’ll need to start up a JBoss server on my machine. This is on top of the three other servers currently running simply as multiplatform test beds. Additionally, our build / database machine is also starting to get overloaded with servers. IIS, CCNet, MySQL, MSSQL, soon JBoss, maybe eventually Postgres, MSMQ, and a small possibility of other Java Application Servers thrown in for good measure.

It’s starting to get ridiculous. Although the machines can handle it, I’d much rather see a situation where this wasn’t necessary, especially when it comes to my own development machine. But, it’s almost necessary to have each of those servers installed locally to develop against, as well as have another set up on the build machine to run the pre-deploy tests against. And, all of these systems have to be in sync. All of the developer’s databases need to match the database I developed against, and the test servers need to match mine before the tests are run. This should be done automatically, but for the time being it’s not (I’ll get around to it!).

So anyway, it’s a big fiasco. The way I see it, a perfect development / test server environment would have the following properties:

  1. You have copies of all the local servers installed, but they are only running when you’re developing against them. Otherwise they’re disabled.
  2. The build / test server can either have all servers running and installed, so long as they don’t conflict with one another.
  3. All of these servers need the same names, usernames, and passwords, or be able to generate similar ones in the unit tests.

I’ve been thinking about virtualization as a possible way to overcome this problem. Basically, have each of the server collections running in a separate virtual server. So, have my database servers running on a virtual server on my box that I can shut off when I don’t need it. Then have JBoss running on a separate virtual server, and MSMQ on another, and have them named along the lines of “hostname_dbs”, and “hostname_jboss”. The idea here is that the names of the servers could be generated in the unit test, and the server images copied (with the host name changed) to all the developers pretty easily.

For now, I'll probably just install JBoss this way and see how it works. I'll let you know.

MySQL and Visual Studio

So, I’ve been working a lot with Visual Studio, and although it doesn’t do everything perfectly, I tend to like it. One of the things I tend to like about it is its ability to group together lots of seemingly disparate technologies into a single solution file so that you can have all the information about your “solution” right at your fingertips.

Today, in preparation for porting our technology to MySQL, I decided to try to attempt to make Visual Studio database projects and MySQL work nicely together. Believe it or not, this is insanely hard without purchasing an OLE DB provider, and I couldn’t find a single site through Google that told me how to get Visual Studio database projects working with MySQL. Even MySQL’s Visual Studio plugin doesn’t actually support everything you think it should (you can’t run scripts for example, the major piece of the puzzle I wanted). So what’s a programmer to do? In this case, I found a kind of solution that doesn’t give you all of the features of the MySql plugin in the Server Explorer, but it does give you the run feature, which I feel is much more important.

For those of you that are curious, here’s the solution. First, download and install the MySQL Connector/ODBC libraries. In my case, I used the 5.0 libraries because we’re developing against beta software anyway (the 5.1 release of MySQL), but I have a feeling the 3.51 release will work just as well. Next, open your server explorer and add a data connection. You’ll be presented with this window (click for the full image):

Add Connection window

Click “Change” next to the data source and select “Microsoft ODBC Data Source” and the add connection window will change to this:

Add Connection for ODBC

Now, you can go through the trouble of attempting to make system or user data sources here, but you’re better off just making a connection string, so select “Use Connection String” and type in your ODBC connection string. For me, this was similar to:

Driver={MySQL Connector/ODBC v5};server=localhost;database=Database;

and add the user name and password below (note that if you’re using the older version, I believe the driver needs to be set to “mySQL ODBC 3.51 Driver”, though I’m not actually sure). Test the connection, and you’re all set. Multi-line scripts should now run just fine and you’re all set to develop as if you were working on SQL Server.

Edit: Although this works for executing multi-statement queries, it doesn't work for doing things like creating procedures. MySQL chokes on the semi-colon being used inside multi-statement procedures, and there's no way of using the delimiter command to change the end of statement delimiter. In my opinion, this is a huge problem, and the whole concept of requiring that you change delimiters feels like a hack. I'm not sure if there's a way around this, but you can be sure I'll write about it if there is.