Wednesday, August 17, 2011

Merge SQL tables

Existing system:
1. Download Xml data by Http request
2. SqlDataSet.ReadXml to load XML data into dataset
3. (Re)create empty Sql table based on the dataset schema - based on some code I found online
4. Use SqlBulkCopy to copy data from dataset into the new table

Table sizes are large... up to 200 columns, up to 10000 rows. Not manual by any means, and I'd say not even manually-scriptable since keeping track of 100 columns in script is pretty impossible.

Problem:
Several tables contain variations in Http request type that require me to get the table in pieces. So, I download 4 tables instead of 1, and they all have slightly different columns (Xml response omits NULL elements, and that causes some columns to be missing). Thankfully, I can rely that the columns with same data will always have same names. I need to UNION all rows from the 4 tables into 1 table adding nulls for missing columns, but Union does not work with missing columns. That means I need to JOIN all the columns first and then use Union. (Union works on rows, Join on columns). But when I try the full outer join or any other joins, I keep getting duplicate columns!! The internet is pretty full of Join/Union tutorials for a variety of problems, and it is frustrating to even try to read through all those search results. So far I haven't seen anything useful to me.

Let me formulate the problem with math maybe.
1. I want to find a 'universal' column space for all 4 tables - which is the set-union of their columns.
2. Place each of the 4 tables into this new space.
3. Once all tables are in common column space, it is easy to union them.

Solutions:
I'm sure there is some clever trick to join all columns together. But it is not obvious to me and thus useless.


- Suggested by Wenping, a coworker:
1. To find a common column space, I take the list of all columns from each of 4 tables. Since I know same columns will have same names, I can merge the lists, sort and remove all duplicates.
2. From this list I can create a SQL CREATE statement to make a new table
3. When I download tables, I can bulk copy them straight into the new common table or union into that table after downloading

- Before using SqlBulkCopy, check if the receiving table has all the columns in dataset. If not, I can use ALTER statement to add new columns and then bulk copy everything. This requires more coding work but will anticipate errors later on when suddenly new columns appear in the Xml response.