The Database Upgrade Problem
This is a problem I think everyone has at some point when working with databases that can’t lose data, and specifically those that generated their database from another intermediary. The problem is this: when someone changes the source file from which the database is generated, how do you upgrade said database without losing any data? Or, losing as little data as possible.
In my mind, there are two approaches to this. The first is to attempt to run a diff between the new file and the old file, and generate an alter script that can be run on the database. The alter script contains the information to add new properties and delete old properties as needed, and can be stored in source control to run in sequence if required. These generated alter scripts are also nice, because if there’s a problem with one of them (say, a property is renamed and the diff believes it was a drop + add) you can fix it manually and still leave the rest of the script alone.
The problem with this solution is that it requires the old version of the file as well as the new version, causing some interesting workflow problems. For example, let’s say I’m a programmer that wants to add a new property to the database. Here’s what I want to do:
- Open my XML
- Add the property
- Regenerate the code
- Run the new scripts on the database
In actuality, here’s what I have to do:
- Copy my old XML to a backup.
- Add my property to my new XML
- Run the generator with my old version and new version of my XML
- Run the new scripts on the database.
So, now, I have my old version and new version, I test what I’ve done and find that I added the property wrong or I need an extra property. What do I do? I can’t revert the database. I have to produce *another* copy of the XML, change it and run the upgrade system… this leaves me with lots of old backed up XMLs… it’s just not pretty. Then what happens when another programmer wants to change things? Provided you’re not checking in your alter scripts (and, sometimes, even if you are) he has to hold on to his old XML, update to the new one, run the generator, run the alter script and now he can get working. That’s best case. What if he’s working when you check in? It’s just not pretty.
The other method (and I’ve seen this done) is to consult the database for its current state, and diff it against what the XML is expecting, then change the database accordingly. In this case, you could also, of course, just generate an alter script, but you run into the issue of, again, having lots of alter scripts for a single check-in. In addition, sometimes manipulation of the database (as mentioned before) can have unforeseen consequences. If your diffing system interprets a rename as a drop and re-add, you lose that data, and if you’re going directly to the db instead of an alter script, that’s bad.
The only solution here I can think of, which isn’t great, is to also have a “revert” script generated with your alter script, then write a shell script for your day to day workflow that does the following:
- If it exists, run the revert script from your previous run of the upgrade generator (here, it would be nice to also be able to tell if this were actually necessary)
- Use your version control system to check the version of the XML you are developing against and grab the pristine copy of that file to a temporary location (most source control allows you to do this, if only for diffing purposes).
- Run the upgrade generator, which produces an alter and a revert script.
- Run the alter script on the database.
Then, you can change your XML again (or merge with something another programmer did) and what you’d just done to the database is reverted before applying the new change. Problems? You have to remember to delete the revert script when you check in, so that whatever you checked in doesn’t get undone. I’m sure there are ways to check to see if you actually want to run the revert script, but generally, I can’t think of a good one.
Anyone found other ways to solve this problem?