Focusing on setup and development using SQL Server products from 2000 to 2008 and beyond. Also about anything geeky that compels me to write a blog entry.

Monday, December 28, 2009

Redirecting SSIS Data Errors to Avoid Package Failure

When setting up a new Integration Services package, is it often advantageous to leave your errors untrapped so your package crashes and you can interpret the errors. But after you get past the first wave of setup, it is often best to trap for additional data errors in your code to prevent complete package failures.

So how do you do it? I'll cover it in broad strokes.

I start with my source table structure. In order to capture error output, I want to create a table that matches the field names of the source table, but not the data type. The idea is to have a table that is so wide open, it can handle any data you throw at it.

For example, the source table, lets call it T1 has these fields:
ID Int Not Null
Name varchar(30) Not Null
Addr varchar(40) Not Null

To handle error output, you would create a table (T2) with these fields:

ErrCode varchar(100) Null
ErrColumn varchar(100) Null
ID varchar(100) Null
Name varchar(200) Null
Addr varchar(200) Null

Let's go over the noteworthy differences. First I added the ErrCode and ErrColumn fields which will be populated with data from the SSIS error handling.

Most importantly, I changed the data types on all fields to varchar fields and made sure the size was big enough to handle even bad data. Depending on your situation, you may adjust these sizes much larger to accommodate the worst possible over sized data records.

Lastly, I allowed Nulls in all of the columns to ensure that every field of data in the row with errors can be stored in the error table, regardless of Nulls.

Now that the structure is in place, it is time to add to the SSIS package. In the Data Flow where you are pulling from your source table, in our case the nondescript T1 table, we must add the error output. Within the data flow section, drag a destination object from the toolbar that connects to your new T2 table. Then click on the Source object that connects to T1 and drag the red line from the bottom of the object to the T2 Destination box.

If you edit the Error Output in the T1 Data Source, you can specify what should be done with each column of data. You can have a column cause the package to fail on error, ignore the error, or redirect the error output, which is the option we want to use. By redirecting all columns on error, they become available to consume in your T2 table data destination.

Now that the red line is attached, you can edit the properties in your T2 destination to insure that the ErrCode, ErrColumn, and all other fields are mapped properly from your error output to your T2 fields. Once you have the mappings all established, you should be ready to run.

To me the great benefit of this type of error handling comes in dealing with bad data that you are pulling from production databases into secondary aggregate, data warehouse, or reporting databases. It allows you to get most of the data into the primary tables and the leisure to review the bad results afterward. Of course, if the data you are pulling has data requirements that would make partial data useless, you must consider this when considering the error output option. Regardless, using error output is an excellent way to troubleshoot as you are developing a new SSIS package.

No comments:

Post a Comment