Category Archives: Programming

Migration Options: DB2 to SQL Server

I did absolutely nothing for my 70-433 prep this week because I was distracted by a new project at work that really has me excited. So excited, I’ve been thinking about, working on it and even dreaming about it all week, minus a few potty breaks. Although I’m failing on my studying, I learned a ton.

The Mission:

Migrate a database from DB2 (UDB) to SQL Server (edition unknown).

Factors:

  • Portions of the schema are predefined.
  • Portions of the schema are dynamic. Many tables and their columns will vary based on configuration.
  • The database is large. (Exact size unknown, ~several hundred GBs.)
  • Time will be short. (Production DB, needs to be back online ASAP.)
  • There isn’t a DB2 option in SSMA. (Microsoft does offer a White Paper though.)

Options:

  1. Plain Old Export -Export to the data to text files and use bulk insert to re-ingest the data.
  2. Double Migration – Migrate from DB2 to Oracle using SQL Developer then, migrate from Oracle to SQL Server using SSMA.
  3. SSIS – Use SQL Server Integration Server to migrate the data.
  4. Something Else – What I ended up going with. We’ll get to that later.

What I Learned:

Since this blog is about what I learned, let’s explore the pros and cons of each of the above mentioned options. Some are obvious (like option #3) and some took a lot of digging, trial and error to realize.

Option 1: Plain Ol’ Export.

Because some of the tables are dynamically created and named, and their column names and data types are also dynamic, my migration strategy needs to deal with not only the data, but the schema as well.

A DB2 command line utility called db2look was a great solution for this.The options are fairly flexible and create the requisite DDL in a text file ready for execution in another system. Well, sort of ready.

The db2lookup reference offers several options to create DDL for specific schemas, specific creating users, generating views, granting permissions and more. I went with a fairly simple approach – just create the tables for my specified schema.

db2look -d mydatabase -z myschema -i username -w password
     -e -o myddlfile.sql

The resulting file contained the DDL for all tables in my schema, including index creation and constraints. Of course, the DDL used DB2′s data types so I used a text editor to find and replace their data types with the ones I needed in SQL Server. I also used a text editor to comment out the index creation and constraint sections since I don’t want to create those until after the data migration for performance reasons. This wasn’t the most elegant solution but it wasn’t terribly time consuming and it got the just done enough to move on to the data migration portion.

DB2 includes several command line utilities useful in exporting data to file. There are features similar to the SQL*Plus spool command but primarily, I worked with the aptly name export command to dump the contents of my query to file.

Along with the OF DEL parameter, DB2 EXPORT can be used to dump the results from any query to an ASCII text file. The chardel parameter specifies the character string identifier if you want to use something other than the default (double-quotes). Or, you can use nochardel to eliminate character string identifiers. Non-default column delimiters are defined with the coldel parameter.  (The default is the tab character.) And, the DB2 time-stamp format can be modified using the timestampformat parameter. (A full listing of the export modifiers can be found here.)

After working through several kinks with the Export options, I came up with a command that looked something like this:

db2 export to mytablename.txt of del modified by chardel''
     timestampformat="yyyy-mm-dd hh:mm:ss.uuu"
     select * from mytablename

This produced export files that looked something like this:

123    'my string'    '2011-11-01 09:23:12.000'    456    ''    789

Note the two single quotes in the 5th “column”. Those would turn out to be the bane of my existence with this option. You see, that’s not a null and it’s not a space. It’s an empty string.

When I first attempted to use BULK INSERT to ingest this data, it choked on the fact that there were single quotes around the dates (which are datetime, the original type). If I chose to export without the single quotes (i.e. using the nochardel parameter), the dates imported just fine into SQL Server, but that 5th column was interpreted as a null instead of an empty string.

I went back and forth with the export format and ingestion trying to get the dates to be dates, the strings to be strings and the empty strings to NOT be nulls but I just couldn’t get it right 100% of the time.

Finally, I turned to format files in an effort to explain the delimiters and file contents to BULK INSERT.

I ended up using the chardel” option in DB2 export and a format file that looked something like this

9.0
6
1       SQLCHAR       0       4       ",'"     1     entryID      ""
2       SQLCHAR       0       255     "','"    2     entryName    SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       24      "','"    3     entryDate    ""
4       SQLCHAR       0       4       "',"     4     entryValue   ""
5       SQLCHAR       0       100     ","      5     entryString  SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR       0       4       "\r\n"   6     entryAltInt  ""

This method again got me around the single quotes with the non-empty strings and the dates but even if I included an extra quote in column 4 and 5′s delimiters, I’d end up with at least one single quote in the imported data. Better than a null, but still not correct. (And, even if using a format file worked, I’d have to create one for every table. Even the ones I don’t know about already.)

Since I wasn’t the first to deal with the dreaded ‘empty string vs. NULL‘ issue unsuccessfully, I decided to put this method on hold for a while and see what other options were out there.

Option 2: Double Migration

While the SQL Server Migration Assistant (SSMA) doesn’t offer a DB2 option, Oracle’s SQL Developer does. This strategy was actually suggested by a co-worker as it would probably be the most accurate in dealing with data type conversion, empty-strings vs. nulls and a host of other issues. While it does sound like a good choice for accuracy, it sounds absolutely terrible for efficiency.

I’ve never used SQL Developer to migrate a database but I have used SSMA to migrate a database from Oracle to SQL Server. Since it was a personal test system and I didn’t really care how long it took, I didn’t bother to disable indexes or do anything else that might make performance better. Still, the GUI portion for data type mapping and the schema creation took at least an hour and the data migration took at least 4-5 hours for a 1.7 GB database.

Given those numbers, the potential database size and the fact that not one but two migrations would need to occur, I figured I was looking at about 60 days for a double migration (1.7 GB in 4 hours == 30 GB in 704 hours; times two), minus whatever time I could squeak out by disabling indexes. (Of course I’ll have to rebuild them after anyway but I’ll have to do that in any scenario.) Because of this, I chose to save this option as a last resort and didn’t bother testing it.

Option 3: SSIS

Prior to this project, I had never used SQL Server Integration Services (SSIS) but I’d read plenty about the power of this tool. I gave myself a crash course on the UI, touched base with our resident SSIS expert for some pointers and got to work.

Since my schema was already created from testing Option 1 and since that method seemed fairly quick for that task, I chose to just attempt the data migration with SSIS.

First, I created a new SSIS project in Business Intelligence Development Studio (BIDS) 2008. I setup two Data Connections – one to my SQL Server 2008 “destination” and one from my DB2 UDB “source”. For the DB2 connection, I installed Microsoft’s OLE DB Provider for DB2 as it would be much faster than using ODBC. (I already had the IBM OLE DB Provider for DB2 but had read a few notes online indicating that the Microsoft one was faster.)

After this initial setup, I created a ‘Data Flow’ task in the Control Flow and configured the Data Flow to migrate data from the source database (DB2) to the destination database (SQL Server 2008 R2). In my first go, I selected a specific table for migration using the ‘Table or view’ option of the Data Access Mode. (Not unlike what you see here, although this is for Azure and also uses some ODBC.)

I did no data type mapping, just let the migration occur based on the data types of my self-created schema. I executed the package and found it was quite speedy against the largest table in my test database. The data looked correct too – DB2 timestamps were SQL Server datetimes, empty-strings were actually empty-strings, nulls were actually nulls. Very nice.

Next, I attempted to use SSIS to perform the migration for the entire schema. I’d noticed a ‘Table name or view name from variable” option. I decided to use this option, along with a For Each Loop to call my Data Flow task and loop through a list of tables.

I set a table name variable within the Project, I created a For Each Loop containing the list of tables and setting the variable. Within the For Each Loop was my Data Flow task that took data from the source (DB2) using the variable set table name and migrated it to the destination (SQL Server 2008 R2) into the variable set table name. This worked great for the first table and then promptly came to a screeching halt yelling at me about not being able to find a column in my second table.

As I investigated further, noticing that the “missing” column didn’t even belong in Table 2 (rather, it was the first column of Table 1), I found that the column names and data types are actually strongly typed within the SSIS package. Which means, they can’t be overridden at run time. So, while you can use SSIS to loop through a list of tables, each of those tables needs to have the same column structure and order. Rats. Foiled again.

So, with unless I want to create a separate package for each table structure – including those tables and columns that I won’t know about ahead of time, it looks like SSIS might not be the right choice either.

(NOTE: While trying to find my reference links from the SSIS portion, I came across this post from Strate SQL which re-iterates my problem with the Data Flow task in SSIS. It looks like the article might have an alternative solution that I’ll have to look into.)

Option 4: Something Else Entirely

After having noted the speed with which SSIS was capable of migrating data between the two platforms, I was curious – why wasn’t there some way I could tap into this method? Or was it really the same as using BULK INSERT or bcp.

What I found is that SSIS actually uses INSERT BULK, “which is a mystery unto itself” according to this article. Thankfully, someone else had solved this “mystery” and reported that INSERT BULK is “produced by client bulk copy API. You cannot use the INSERT BULK syntax directly from client or server-side code. Instead you have to use one of the Bulk Insert APIs like .NET SqlBulkCopy …“.

Although I’ve wanted to do some “real” programming for a while, I’d never found the right project to really get me into it. It turns out, this project was just the motivation that I needed.

So, I fired up Visual C# 2010 and got to work writing a console application that would utilize the SqlClient BulkCopy class to import data into SQL Server. While my code is rather simple, since it was written for work, I can’t share it directly. However, the sample C# code on the MSDN page for BulkCopy was my starting point. I combined that with an OleDbConnection to the DB2 database and some text reading components to loop through a list of tables to perform a migration of the entire schema.

Using this method, the entire 1.6 GB database transferred in less than 9 minutes. Following the same equation used for Option 2, this means my full migration will take approximately 28 hours (1.6 GB in 9 minutes == 300 GB in 28 hours). That’s pretty fast!

My next steps involve turning my console application into a Windows Form application, making it more configurable (input and verify connection strings, specify table list file, etc.) and possibly using this same method to generate and execute the schema creation DDL as well as the indexes and constraints after the migration is complete. I’ll also be bulking up the test database to at least 10 GB to do more performance testing of this method.

I was quite an exciting week and I’m looking forward to getting back to the project after the weekend.