Thinking About Database Revision Control

In a current database driven project that has multiple developers the single biggest issue we have is version control for the database.

For us at ibrow, the number one practise that we engage in is Version Control/Revision Control. I personally use version control for almost everything and not using it when having multiple developers working on a project sends shivers down my spine.We currently use Subversion, however I’m thinking of moving over to Git, but there is also Bazaar and Mercurial plus many, many others.

Whilst there is this vast array of version control system for code, there doesn’t seem to be the focus on version control for database schemes. Jeff Atwood has a couple of posts espousing the virtues of version control for databases. There is Rail’s Migrations which seems to be the most advanced version control for databases to date and there is a project to make a PHP version of Migrations. However, Migrations still has it’s flaws, which are discussed in Adam Wiggins’s blog post about this very subject. Adam offers some good insights into the problem and has a brain storm about possible solutions. But I think I’m coming to the problem from a slightly different angle to him.

As most of our projects are in constant development we do not, as such, have versions, but instead revisions. This is a key difference here. If we check in a database change to the version control, we don’t yet know it’s revision, especially if we have multiple developers working on the same project and possibly changing the scheme of the database at the same time.

The ideal solution would use the following steps:

  1. Developer finishes their chunk of code with any database changes, and are ready to commit. The database changes include UP and DOWN changes.
  2. Developer commits their code changes, a revision number is assigned
  3. Based on this revision number, the database changes are committed.

Obviously the developer should update their working copy before committing, just to see if there are any conflicts and to make sure they have the latest copy of the code.

On the updating side, the following steps should occur when the developer updates their  code:

  1. The update process makes a note of their current revision number
  2. Downloads all code changes via the SVN update command
  3. Downloads all the database changes since their last revision
  4. Update process applies all database changes to developer’s local database

This process can be reversed for reverting or merging backwards.

Currently I can’t find anything out there that does this – which probably means one of 2 things:

  1. My Google-fu isn’t up to scratch
  2. It is actually a lot harder than I think!

I really need to get this right, as this is possibly the most important thing to get right for us at the moment. It would be great to hear any possible solutions or existing implementations out there.

Maybe this will be an interesting sub project for me over the next few weeks?

(And obviously I haven’t even considered trying to keep the data up to date – but I guess that will be for another day.)

  • http://www.w-dev.net/ Pete

    I think the answer is 2 : It is actually a lot harder than you think ;-)

    Revision control systems are based on the premise that we’re mostly dealing with text files, and when there’s a conflict we can diff and merge. Obviously this doesn’t work with databases.

    One can keep a record of all changes as (an) SQL file(s), and when updating, import the latest changes.

    Problems are (apart from the possibility of forgetting to record a
    statement) that:

    – The statements aren’t easily automatically reversible when you need to do a revert. That would take a sophisticated SQL parser (that would also be DB server specific)

    - Some transactions aren’t simply reversible (DELETE, DROP etc)

    - It’s little help when there’s a conflict – i.e. two devs commit 2 very different changes to the database. The accompanying code (at least at the model level) is also too tied to the underlying DB structure.

    One solution (though inelegant and wasteful) is to force (or better script) that just prior to each commit, a full DB dump is made. When updating, the existing local DB is completely replaced by the dump. Still doesn’t help much with conflicts with mutually exclusive DB structure changes – there then would be a “winner” and a “loser”.

    Despite my fondness for things agile and extreme, I actually think it’s often good in the long-term to have the core DB structure made concrete before starting coding. It’s always a major PITA when core tables need to change half-way through a project. (Extra tables aren’t usually a problem)

    One non-technical solution is to have one dev designated as the DBA, and all changes to structure and required records would go atomically through them.

    • http://www.robsearles.com Rob Searles

      I think you’re right – it is a lot better to get the structure of the database specified at the start. Unfortunately this isn’t always possible, especially if your working on a project where your “feeling your way” in the start – either because you are using new techniques, or the brief is an evolving one.

      The full DB dump might be good if it is also accompanied by update scripts for both up and down. This way an existing system can just have their tables updated, but a new system can have a clean install. However, this still doesn’t solve the problem of the 2 devs committing differing changes to the database – there will always be, as you say, a winner and a loser.

      The best solution is probably the non-tech solution! (Which pains me to say!) where one dev is the DBA and all changes can go through them. This could be coupled with an agile mentality of having daily meetings to discuss any structural changes needed the the database and why.

      Thanks for your thought provoking comment.

  • pete_bln

    I think the answer is 2 : It is actually a lot harder than you think ;-)

    Revision control systems are based on the premise that we're mostly dealing with text files, and when there's a conflict we can diff and merge. Obviously this doesn't work with databases.

    One can keep a record of all changes as (an) SQL file(s), and when updating, import the latest changes.

    Problems are (apart from the possibility of forgetting to record a
    statement) that:

    – The statements aren't easily automatically reversible when you need to do a revert. That would take a sophisticated SQL parser (that would also be DB server specific)

    - Some transactions aren't simply reversible (DELETE, DROP etc)

    - It's little help when there's a conflict – i.e. two devs commit 2 very different changes to the database. The accompanying code (at least at the model level) is also too tied to the underlying DB structure.

    One solution (though inelegant and wasteful) is to force (or better script) that just prior to each commit, a full DB dump is made. When updating, the existing local DB is completely replaced by the dump. Still doesn't help much with conflicts with mutually exclusive DB structure changes – there then would be a “winner” and a “loser”.

    Despite my fondness for things agile and extreme, I actually think it's often good in the long-term to have the core DB structure made concrete before starting coding. It's always a major PITA when core tables need to change half-way through a project. (Extra tables aren't usually a problem)

    One non-technical solution is to have one dev designated as the DBA, and all changes to structure and required records would go atomically through them.

  • http://www.robsearles.com Rob Searles

    I think you're right – it is a lot better to get the structure of the database specified at the start. Unfortunately this isn't always possible, especially if your working on a project where your “feeling your way” in the start – either because you are using new techniques, or the brief is an evolving one.

    The full DB dump might be good if it is also accompanied by update scripts for both up and down. This way an existing system can just have their tables updated, but a new system can have a clean install. However, this still doesn't solve the problem of the 2 devs committing differing changes to the database – there will always be, as you say, a winner and a loser.

    The best solution is probably the non-tech solution! (Which pains me to say!) where one dev is the DBA and all changes can go through them. This could be coupled with an agile mentality of having daily meetings to discuss any structural changes needed the the database and why.

    Thanks for your thought provoking comment.