Tag Archives: SQL Server

Exam Eve: Pre-Test Thoughts on the 70-433

It’s the eve of my 70-433 exam and I’m not feeling terribly confident about my chances of passing the test tomorrow.

I’m sure a portion of my self-doubt is just normal pre-test jitters. It’s been nearly four years since I took a Microsoft exam and I recall being convinced I had failed the last one (70-290 – Windows Server 2003 Administration) until the nice little screen popped up and said that not only had I passed, but I’d only missed two questions!

The other portion of my anxiety is because I really haven’t focused on studying as much as I had hoped – despite setting a clear goal and mapping out time to accomplish it. I’m really good at procrastinating and in the midst of these few weeks of studying I also made a quilt, painted my home study and started playing a few new video games – I’m sure these were in some ways just to avoid studying.

Here are my pre-test thoughts on the test objectives:

Implementing Tables and Views (14 percent)
I feel pretty confident about this objective. I’m strong in creating and altering tables and views and indexing. I’m not quite as confident about spatial data types, partitioning and I’m worried there will be nit picky questions where they put some command in a statement that isn’t compatible because the table doesn’t have a PK and I’ll miss something little like that.

Implementing Programming Objects (16 percent)
Overall, not as confident on this one. I’m good with stored procedures, transactions and error handling. Not quite as confident about CLR and UDFs.

Working with Query Fundamentals (21 percent)
I’m probably the most confident about this section – this is what I do all the time and the new features I learned about (MERGE, OUTPUT, GROUPING SETS, PATINDEX, etc.) are things I could find practical applications for and I really got into trying them out. Unless there’s some nit picky trick questions, I think this section will go well.

Applying Additional Query Techniques (15 percent)
I have medium confidence here. I’m not too worried about subqueries, controling execution plans or the collation stuff but although I’ve done a lot of practicing, CTEs (especially recursive) and the ranking functions might stump me.

Working with Additional SQL Server Components (11 percent)
I could set up Database Mail and get it to work but I didn’t memorize all the ins and outs. Full Text search seems pretty straight-forward but I didn’t spend a lot of time on it and didn’t actually try it out at all. Not feeling great about the Service Broker or the change tracking either. This section probably won’t go well.

Working with XML Data (12 percent)
Ugh. I dread this section. I actually even had a real value use for some of the features of XML with SQL Server but I couldn’t get them to do what I wanted and found the syntax – especially OPENXML and XQUERY to be painful and think it would be easier to write C# code to convert XML data to relational data. I am comfortable with XML in general, the FOR XML portions, basic XPATH and schemas. I doubt that will be enough knowledge though.

Gathering Performance Information (11 percent)
This section is also something I’m pretty familiar with. Although I don’t use them a lot in my current role, I spent several years needing to regularly use Profiler or read execution plans to diagnose issues. I’m not as experienced with the Database Tuning Advisor (but it’s pretty much just a wizard from what I can tell) and I can figure out which DMV or catalog to use when I need it but I certainly don’t have them memorized. Overall, I’m fairly confident with this section though.

The exam is at 10am tomorrow. I’m a fast test taker – either I know it or I don’t – but I’m going to try to go slow and really use all of the time I can to go over things in detail. I should have a verdict by 1pm.

MERGE and OUTPUT in SQL Server

I spent a decent chunk of this afternoon studying for the upcoming 70-433 exam (rescheduled for December 20th). Today I focused primarily on MERGE and OUTPUT clauses.

MERGE is pretty interesting – compare the contents of two tables and update one based on the content in another. MERGE is new to SQL Server in 2008. OUTPUT is also interesting and I wish I’d known about it previously as it would’ve been very handy. OUTPUT let’s you see the changes you made and (optionally) store them! OUTPUT has been around in SQL Server since 2005.

I went through a tutorial on MERGE and then read up on both MERGE and OUTPUT on Books Online. Then, I closed by browser and tested myself to see what I’d learned.

Here’s what I came up with.

/* I'm opening a store that sells only things I like */

CREATE SCHEMA store
GO
CREATE TABLE store.products
( product_id INT
, product_name VARCHAR(150)
, product_desc VARCHAR(400)
, product_price MONEY)
GO
INSERT INTO store.products VALUES
(1, 'Bacon', 'Tasty Bacon', 5.49)
, (2, 'Cheese', 'Sharp Cheddar', 4.79)
, (3, 'Beer', 'Sierra Nevada Pale Ale', 8.49)
GO
SELECT * FROM store.products
GO

/* We're making some changes to our products and have another table with updates to our products list. */

CREATE TABLE store.product_updates
( product_id INT
, product_name VARCHAR(150)
, product_desc VARCHAR(400)
, product_price MONEY)
GO
INSERT INTO store.product_updates VALUES
(1, 'Bacon (Original)', 'Tasty Bacon, original recipe', 5.49)
, (3, 'Beer - IPA', 'Sierra Nevada Pale Ale', 8.49)
, (4, 'Bacon - Peppered', 'Tasty Pepper Bacon (new!)', 5.79)
, (5, 'Beer - Cheap', 'Yuengling Lager', 4.99)
GO
SELECT * FROM store.product_updates;
SELECT * FROM store.products;
GO

/* Use a Merge Statement to update our products based on the information in the product_updates table
NOTE: Not all products are listed in our source table, only products that have been changed */

MERGE store.products AS TARGET
USING store.product_updates AS SOURCE
ON (TARGET.product_id = SOURCE.product_id)
WHEN MATCHED AND
/* Update Product IDs that are matched*/
TARGET.product_name <> SOURCE.product_name
OR TARGET.product_desc <> SOURCE.product_desc
OR TARGET.product_price <> SOURCE.product_price THEN
UPDATE SET TARGET.product_name = SOURCE.product_name
, TARGET.product_desc = SOURCE.product_desc
, TARGET.product_price = SOURCE.product_price
/* WHEN NOT MATCHED BY SOURCE THEN
The product has no update, don't do anything
If we wanted to, we could delete rows that weren't found in the updates table */
WHEN NOT MATCHED BY TARGET THEN
/* This must be a new product, insert it! */
INSERT (product_id, product_name, product_desc, product_price)
VALUES (SOURCE.product_id, SOURCE.product_name, SOURCE.product_desc, SOURCE.product_price);
GO
SELECT * FROM store.products;
GO

Pretty sweet! Our products table is now updated with the appropriate changes!

Keeping a history of your changes could be very useful. In this case, we could use them to track all sorts of things – like how sales compared when prices changed, etc. In my daily life, it would be nice to have the changes recorded as a backup in case I needed to undo some particular change and didn’t realize it for a few hours or days. OUTPUT comes in handy here. It’s not limited to being used with MERGE either – it can be used with INSERT, UPDATE or DELETE. It can just display the changes or it can write them out to a table.

/* What if we wanted to keep a record of changes? Use an OUTPUT clause with the MERGE statement! */

-- undo changes from previous merge
DELETE FROM store.products;
GO
INSERT INTO store.products VALUES
(1, 'Bacon', 'Tasty Bacon', 5.49)
, (2, 'Cheese', 'Sharp Cheddar', 4.79)
, (3, 'Beer', 'Sierra Nevada Pale Ale', 8.49)
GO

/* Create a table to store the historical changes */
CREATE TABLE store.product_history
(change_date DATETIME2 --when change occurred
, change_type VARCHAR(100) --insert, update or delete
, old_product_id INT --beginning product_id
, old_product_name VARCHAR(150) --beginning name
, old_product_desc VARCHAR(400) --beginning description
, old_product_price MONEY --beginning price
, new_product_id INT --new product_id
, new_product_name VARCHAR(150) --new name
, new_product_desc VARCHAR(400) --new description
, new_product_price MONEY) --new price
GO

/* Same MERGE statement with an OUTPUT clause */

MERGE store.products AS TARGET
USING store.product_updates AS SOURCE
ON (TARGET.product_id = SOURCE.product_id)
WHEN MATCHED AND
/* Update Product IDs that are matched*/
TARGET.product_name <> SOURCE.product_name
OR TARGET.product_desc <> SOURCE.product_desc
OR TARGET.product_price <> SOURCE.product_price THEN
UPDATE SET TARGET.product_name = SOURCE.product_name
, TARGET.product_desc = SOURCE.product_desc
, TARGET.product_price = SOURCE.product_price
/* WHEN NOT MATCHED BY SOURCE THEN
The product has no update, don't do anything
If we wanted to, we could delete rows that weren't found in the updates table */
WHEN NOT MATCHED BY TARGET THEN
/* This must be a new product, insert it! */
INSERT (product_id, product_name, product_desc, product_price)
VALUES (SOURCE.product_id, SOURCE.product_name, SOURCE.product_desc, SOURCE.product_price)
OUTPUT
GETDATE() --date of change
, $action --type of change
, DELETED.* --old stuff
, INSERTED.* --new stuff
INTO store.product_history;
GO
SELECT * FROM store.product_history;

On a side note, I found that Oracle has a command similar to MERGE. Since it’s a combination of an UPDATE and a DELETE, they called it UPSERT! That sounds way more fun than MERGE.

Migration Options: DB2 to SQL Server

I did absolutely nothing for my 70-433 prep this week because I was distracted by a new project at work that really has me excited. So excited, I’ve been thinking about, working on it and even dreaming about it all week, minus a few potty breaks. Although I’m failing on my studying, I learned a ton.

The Mission:

Migrate a database from DB2 (UDB) to SQL Server (edition unknown).

Factors:

  • Portions of the schema are predefined.
  • Portions of the schema are dynamic. Many tables and their columns will vary based on configuration.
  • The database is large. (Exact size unknown, ~several hundred GBs.)
  • Time will be short. (Production DB, needs to be back online ASAP.)
  • There isn’t a DB2 option in SSMA. (Microsoft does offer a White Paper though.)

Options:

  1. Plain Old Export -Export to the data to text files and use bulk insert to re-ingest the data.
  2. Double Migration – Migrate from DB2 to Oracle using SQL Developer then, migrate from Oracle to SQL Server using SSMA.
  3. SSIS – Use SQL Server Integration Server to migrate the data.
  4. Something Else – What I ended up going with. We’ll get to that later.

What I Learned:

Since this blog is about what I learned, let’s explore the pros and cons of each of the above mentioned options. Some are obvious (like option #3) and some took a lot of digging, trial and error to realize.

Option 1: Plain Ol’ Export.

Because some of the tables are dynamically created and named, and their column names and data types are also dynamic, my migration strategy needs to deal with not only the data, but the schema as well.

A DB2 command line utility called db2look was a great solution for this.The options are fairly flexible and create the requisite DDL in a text file ready for execution in another system. Well, sort of ready.

The db2lookup reference offers several options to create DDL for specific schemas, specific creating users, generating views, granting permissions and more. I went with a fairly simple approach – just create the tables for my specified schema.

db2look -d mydatabase -z myschema -i username -w password
     -e -o myddlfile.sql

The resulting file contained the DDL for all tables in my schema, including index creation and constraints. Of course, the DDL used DB2′s data types so I used a text editor to find and replace their data types with the ones I needed in SQL Server. I also used a text editor to comment out the index creation and constraint sections since I don’t want to create those until after the data migration for performance reasons. This wasn’t the most elegant solution but it wasn’t terribly time consuming and it got the just done enough to move on to the data migration portion.

DB2 includes several command line utilities useful in exporting data to file. There are features similar to the SQL*Plus spool command but primarily, I worked with the aptly name export command to dump the contents of my query to file.

Along with the OF DEL parameter, DB2 EXPORT can be used to dump the results from any query to an ASCII text file. The chardel parameter specifies the character string identifier if you want to use something other than the default (double-quotes). Or, you can use nochardel to eliminate character string identifiers. Non-default column delimiters are defined with the coldel parameter.  (The default is the tab character.) And, the DB2 time-stamp format can be modified using the timestampformat parameter. (A full listing of the export modifiers can be found here.)

After working through several kinks with the Export options, I came up with a command that looked something like this:

db2 export to mytablename.txt of del modified by chardel''
     timestampformat="yyyy-mm-dd hh:mm:ss.uuu"
     select * from mytablename

This produced export files that looked something like this:

123    'my string'    '2011-11-01 09:23:12.000'    456    ''    789

Note the two single quotes in the 5th “column”. Those would turn out to be the bane of my existence with this option. You see, that’s not a null and it’s not a space. It’s an empty string.

When I first attempted to use BULK INSERT to ingest this data, it choked on the fact that there were single quotes around the dates (which are datetime, the original type). If I chose to export without the single quotes (i.e. using the nochardel parameter), the dates imported just fine into SQL Server, but that 5th column was interpreted as a null instead of an empty string.

I went back and forth with the export format and ingestion trying to get the dates to be dates, the strings to be strings and the empty strings to NOT be nulls but I just couldn’t get it right 100% of the time.

Finally, I turned to format files in an effort to explain the delimiters and file contents to BULK INSERT.

I ended up using the chardel” option in DB2 export and a format file that looked something like this

9.0
6
1       SQLCHAR       0       4       ",'"     1     entryID      ""
2       SQLCHAR       0       255     "','"    2     entryName    SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       24      "','"    3     entryDate    ""
4       SQLCHAR       0       4       "',"     4     entryValue   ""
5       SQLCHAR       0       100     ","      5     entryString  SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR       0       4       "\r\n"   6     entryAltInt  ""

This method again got me around the single quotes with the non-empty strings and the dates but even if I included an extra quote in column 4 and 5′s delimiters, I’d end up with at least one single quote in the imported data. Better than a null, but still not correct. (And, even if using a format file worked, I’d have to create one for every table. Even the ones I don’t know about already.)

Since I wasn’t the first to deal with the dreaded ‘empty string vs. NULL‘ issue unsuccessfully, I decided to put this method on hold for a while and see what other options were out there.

Option 2: Double Migration

While the SQL Server Migration Assistant (SSMA) doesn’t offer a DB2 option, Oracle’s SQL Developer does. This strategy was actually suggested by a co-worker as it would probably be the most accurate in dealing with data type conversion, empty-strings vs. nulls and a host of other issues. While it does sound like a good choice for accuracy, it sounds absolutely terrible for efficiency.

I’ve never used SQL Developer to migrate a database but I have used SSMA to migrate a database from Oracle to SQL Server. Since it was a personal test system and I didn’t really care how long it took, I didn’t bother to disable indexes or do anything else that might make performance better. Still, the GUI portion for data type mapping and the schema creation took at least an hour and the data migration took at least 4-5 hours for a 1.7 GB database.

Given those numbers, the potential database size and the fact that not one but two migrations would need to occur, I figured I was looking at about 60 days for a double migration (1.7 GB in 4 hours == 30 GB in 704 hours; times two), minus whatever time I could squeak out by disabling indexes. (Of course I’ll have to rebuild them after anyway but I’ll have to do that in any scenario.) Because of this, I chose to save this option as a last resort and didn’t bother testing it.

Option 3: SSIS

Prior to this project, I had never used SQL Server Integration Services (SSIS) but I’d read plenty about the power of this tool. I gave myself a crash course on the UI, touched base with our resident SSIS expert for some pointers and got to work.

Since my schema was already created from testing Option 1 and since that method seemed fairly quick for that task, I chose to just attempt the data migration with SSIS.

First, I created a new SSIS project in Business Intelligence Development Studio (BIDS) 2008. I setup two Data Connections – one to my SQL Server 2008 “destination” and one from my DB2 UDB “source”. For the DB2 connection, I installed Microsoft’s OLE DB Provider for DB2 as it would be much faster than using ODBC. (I already had the IBM OLE DB Provider for DB2 but had read a few notes online indicating that the Microsoft one was faster.)

After this initial setup, I created a ‘Data Flow’ task in the Control Flow and configured the Data Flow to migrate data from the source database (DB2) to the destination database (SQL Server 2008 R2). In my first go, I selected a specific table for migration using the ‘Table or view’ option of the Data Access Mode. (Not unlike what you see here, although this is for Azure and also uses some ODBC.)

I did no data type mapping, just let the migration occur based on the data types of my self-created schema. I executed the package and found it was quite speedy against the largest table in my test database. The data looked correct too – DB2 timestamps were SQL Server datetimes, empty-strings were actually empty-strings, nulls were actually nulls. Very nice.

Next, I attempted to use SSIS to perform the migration for the entire schema. I’d noticed a ‘Table name or view name from variable” option. I decided to use this option, along with a For Each Loop to call my Data Flow task and loop through a list of tables.

I set a table name variable within the Project, I created a For Each Loop containing the list of tables and setting the variable. Within the For Each Loop was my Data Flow task that took data from the source (DB2) using the variable set table name and migrated it to the destination (SQL Server 2008 R2) into the variable set table name. This worked great for the first table and then promptly came to a screeching halt yelling at me about not being able to find a column in my second table.

As I investigated further, noticing that the “missing” column didn’t even belong in Table 2 (rather, it was the first column of Table 1), I found that the column names and data types are actually strongly typed within the SSIS package. Which means, they can’t be overridden at run time. So, while you can use SSIS to loop through a list of tables, each of those tables needs to have the same column structure and order. Rats. Foiled again.

So, with unless I want to create a separate package for each table structure – including those tables and columns that I won’t know about ahead of time, it looks like SSIS might not be the right choice either.

(NOTE: While trying to find my reference links from the SSIS portion, I came across this post from Strate SQL which re-iterates my problem with the Data Flow task in SSIS. It looks like the article might have an alternative solution that I’ll have to look into.)

Option 4: Something Else Entirely

After having noted the speed with which SSIS was capable of migrating data between the two platforms, I was curious – why wasn’t there some way I could tap into this method? Or was it really the same as using BULK INSERT or bcp.

What I found is that SSIS actually uses INSERT BULK, “which is a mystery unto itself” according to this article. Thankfully, someone else had solved this “mystery” and reported that INSERT BULK is “produced by client bulk copy API. You cannot use the INSERT BULK syntax directly from client or server-side code. Instead you have to use one of the Bulk Insert APIs like .NET SqlBulkCopy …“.

Although I’ve wanted to do some “real” programming for a while, I’d never found the right project to really get me into it. It turns out, this project was just the motivation that I needed.

So, I fired up Visual C# 2010 and got to work writing a console application that would utilize the SqlClient BulkCopy class to import data into SQL Server. While my code is rather simple, since it was written for work, I can’t share it directly. However, the sample C# code on the MSDN page for BulkCopy was my starting point. I combined that with an OleDbConnection to the DB2 database and some text reading components to loop through a list of tables to perform a migration of the entire schema.

Using this method, the entire 1.6 GB database transferred in less than 9 minutes. Following the same equation used for Option 2, this means my full migration will take approximately 28 hours (1.6 GB in 9 minutes == 300 GB in 28 hours). That’s pretty fast!

My next steps involve turning my console application into a Windows Form application, making it more configurable (input and verify connection strings, specify table list file, etc.) and possibly using this same method to generate and execute the schema creation DDL as well as the indexes and constraints after the migration is complete. I’ll also be bulking up the test database to at least 10 GB to do more performance testing of this method.

I was quite an exciting week and I’m looking forward to getting back to the project after the weekend.