For a while now, I’ve been putting off a task to configure a sync process for a particular piece of enterprise data. This data is populated into a single table in a production environment on a nightly basis but also needed to be synced down into the test and development environments every now and then. Without going into too much detail about the nature of the data, it consists of about 700,000 records which change either via updates or insertions. Normally I don’t like taking production data down into other environment (there are simply better ways), but the nature of this data called for keeping the environments in sync so the developers could do their job.
I’d been sitting on this requirement for a while as I wasn’t relishing what could become a time consuming, laborious task. This was a table with over 40 columns including lots of foreign keys (one of them self-referential), and a calculated field so as well as being careful not to create any referential integrity problems, I needed it to happen fast so as not to play havoc with a production environment. In my mind, it was going to mean manually writing some form of ETL either directly in TSQL or via SSIS or even going down a SQL replication route. It was possibly even a SQL MERGE task but these particular environments were still stuck on SQL 05 so that route was out.
One quick caveat: I’m not a DBA, I’m a developer who works with databases. There may have been other angles to come at this from, but the solution I arrived at is fast, simple and easy to monitor. The fact that it ended up being a 15 minute job on my weekend and I didn’t mind giving up a little of my valuable Sunday on this particular task was a very nice result!
SQL Data Compare and the command line
Red Gate’s SQL Data Compare is rather adept at syncing databases across environments via a friendly UI. Enter the source and target servers and databases, run the compare, look at the differences, run the sync. Easy as that.
Thing is though, you can also instrument all this via the command line. It looks something like this:
SQLDataCompare.exe /s1:SourceServer /db1:SourceDb /s2:TargetServer
/db2:TargetDb /Include:table:\[MyTable\] /Include:Identical /v /sync
This is mostly pretty self-explanatory, it’s the four switches at the end that are the interesting ones: Only compare the table titled “MyTable” (the square braces are escaped by the preceding backslash), include identical tables in the comparison result so the process doesn’t return an error when everything matches, give me verbose output and actually sync the differences, don’t just compare them.
Because there’s a nice command line interface we can easily orchestrate this from TeamCity. The environment in question was already running the SQL Compare suite as it’s used for Automated database releases with TeamCity and Red Gate. Reviewing the project I created, it’s just a normal TeamCity project with no VCS settings (we’re not getting anything from source control), a command line build step, a couple of build parameters and a trigger. Actually, the command is a little trickier because I’ve extracted it into a template and used environment variables for the server names:
The reason I’ve done this is so that I can reuse the build between both production-to-test and test-to-development data syncs. This makes maintenance really easy if I want to change command line parameters later on. Here’s what the build parameters look like for the production-to-test sync:
The test-to-development sync is identical (I simply copied the production-to-test build), other than obviously setting the two parameters above according to the respective server names. I’ll trigger off the production-to-test sync well and truly out of business hours then on successful completion, trigger a test-to-development build. This way the window where the production server is being touched in minimised.
One last thing: make sure the account TeamCity is connecting to SSQL Server with has had the principle of least privilege actively applied. Put your security hat on and ask “What does this account absolutely, positively need to be able to do?” and then don’t let it do anything beyond this. For example, it only needs to read from production but it needs to be able to select, insert and update in the test and development environments (I’m assuming no redundant records are inserted directly into these environments and that deletions won’t be necessary).
Once a build runs successfully, the log looks something like this:
[10:39:14]: Skip checking for changes - there are no VCS roots defined [10:39:14]: Clearing temporary directory:
C:\TeamCity\buildAgent\temp\buildTmp [10:39:14]: Checkout directory: C:\TeamCity\buildAgent\work\db35110eeed7e0e5 [10:39:14]: Repository sources transferred [10:39:14]: Updating sources: server side checkout... [10:39:14]: Publishing internal artifacts (1s) [10:39:16]: [Publishing internal artifacts] Sending build.start.properties
file [10:39:14]: Starting: "C:\Program Files\Red Gate\SQL Data Compare 9
\SQLDataCompare.exe" /s1:MyProductionServer /db1:MyTable
/s2:MyTestServer /db2:MyTable /Include:table:\[MyTable\]
/Include:Identical /v /sync [10:39:14]: in directory: C:\TeamCity\buildAgent\work\db35110eeed7e0e5 [10:39:15]: Red Gate SQL Data Compare Command Line Utility V9.1.0.365 [10:39:15]: =============================================================== [10:39:15]: Copyright c Red Gate Software Ltd 1999-2011 [10:39:15]: Serial Number: XXX-XXX-XXXXXX-XXXX [10:39:15]: Comparing database MyProductionServer.MyTable with database
MyTestServer.MyTable... [10:39:15]: Registering databases [10:39:17]: Mapping [10:39:17]: Comparing databases [10:40:53]: Generating SQL scripts [10:40:54]: Synchronizing databases [10:41:15]: Finished [10:41:15]: Summary Information [10:41:15]: =============================================================== [10:41:15]: DB1 = MyProductionServer.MyTable [10:41:15]: DB2 = MyTestServer.MyTable
[10:41:15]: Object type Name Records DB1 DB2 [10:41:15]: --------------------------------------------------------------- [10:41:15]: Table [dbo].[MyTable] 683119 == == [10:41:15]: Table [dbo].[MyTable] 7379 <> <> [10:41:15]: Table [dbo].[MyTable] 3217 >> [10:41:15]: --------------------------------------------------------------- [10:41:15]: OK [10:41:15]: Process exited with code 0 [10:41:15]: Publishing internal artifacts [10:41:15]: [Publishing internal artifacts] Sending build.finish.properties
file [10:41:15]: Build finished
Let’s break this down a little: there are a total of 693,715 rows in the source, the vast majority of which are identical to the target other than the 7,379 which have some differences and the 3,217 which don’t exist in the target at all. All of this runs in a grand total of 2 minutes and 1 second but the actual sync script, that is the writing to the target environment, takes only 21 seconds. Nice.
Gotchas
There were two small sightly derailing gotchas in all this: Firstly, Red Gate had a little oversight with the earlier releases of version 9 of SQL Data Compare and managed to break the command line runner. There’s a patch available at that link which brings it up to version 9.1.0.365 which works just fine.
The other one was that TeamCity was initially failing the build with a message like this:
[11:09:21]: Comparing databases [11:09:26]: Unhandled Exception: System.InvalidOperationException: Showing a modal dialog box or form when the application is not running in UserInteractive mode is not a valid operation. Specify the ServiceNotification or DefaultDesktopOnly style to display a notification from a service application.
It turns out there was a dialog box sitting there on the TeamCity machine waiting for some action (from memory, it was one of those “Can we anonymously collect feedback” boxes). It would be nice to be able to instruct SQL Data Compare to run unattended but I couldn’t see anything in the command help that indicated this. Anyway, log on to the server, dismiss the dialog and that’s the end of that.
Summary
This is a pretty straight forward implementation but it’s doing something altogether more complex under the covers. Perhaps it could have been orchestrated from SQL Server using other means but I like the intuitive way SQL Data Compare handles it and I really like being able to see it in my TeamCity build list including the output describing what was affected. And of course I can get all the usual TeamCity notifications if something goes wrong.
But best of all, it’s literally a 15 minute job to setup and it does exactly what I need. That’s a very nice result indeed.