Systems Engineering and RDBMS

Insert data in one table with data from other table(s)

Posted by decipherinfosys on January 27, 2011

This post is in response to this question from a reader:

“Your post on updating data in one table with data from another table was very helpful.  I need to also do an insert into a table (a summary table) with data from a bunch of other tables which are in the same database and also in database(s) on other instances.  Right now, I have a very convoluted process of scheduled jobs, row by row scripts to help me do that.  There is some transformation (mostly lookups) involved as well.  Is there an easier/simpler way to do this?  Appreciate your thoughts on this”

The post that the question above referred to was this one – here.  In that post, we had covered how you can update the data in a table with the data from another table.  Doing an insert into a table with data from other tables (within the same schema or separate schemas or separate databases or separate databases on separate instances) is also fairly simple.  There are many ways of doing it – we will cover the “INSERT INTO …. SELECT FROM … ” approach in this blog post.

Let’s consider this hypothetical scenario to illustrate this (using SQL Server lingo though the same approach is available in Oracle and DB2 as well) – Say the summary table “tblEmpSummary” that this reader wanted to insert the data into resides in a database called EXECUTIVE in a schema called ANALYSIS.  It contains summary information on the employees – their name, their department, their performance review rating, their expense reports per week and the weekly benefit expenses for those employees.

And the source of the data are these:

a) Other table called EMP_PERF_REVIEW in the same schema in the same database (contains EMP_PERF_RATING column that we want and can join on EMP_MASTER_ID column),

b) Other table(s) called EMP_MASTER, EMP_DEPT_XREF and DEPT_MASTER from the schema EMPLOYEE on the same database (contains FIRST_NAME, LAST_NAME, DEPT_NAME information and we can join on EMP_MASTER_ID and DEPT_MASTER_ID),

c) Other table called EMP_EXPENSE_RPT from the EMP Schema in the ACCOUNTING database in the same SQL Server instance (contains WEEK_ENDING_DATE and TOTAL_AMOUNT fields that we are interested in and can join in using EMP_MASTER_ID column), and

d) A view called VW_EMP_PTO from a schema called BENEFIT from the HR database that resides on another instance (Contains PTO_ACCRUED and TOTAL_AMOUNT field for the benefits that we are interested in and can join in using EMP_MASTER_ID).

So, we have data coming from:

a) table in the same schema/same database,
b) tables in the same database but a different schema,
c) tables in a separate database on the same SQL Server instance, and
d) a view from a database in a separate SQL Server instance all together.

If suppose all of our data was coming from (a) alone, all we would have needed to do would be this:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from one source>
FROM Analysis.EMP_PERF_REVIEW
WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

Please do note that the WHERE clause above can contain a NOT EXISTS check on the tblEmpSummary to ensure that no dupes are being inserted – needless to state, this INSERT code would be wrapped in a TRY…CATCH block with proper error handling.

If we now extend it to include the data from (b) as well, the above SQL would change to:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the two sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

As can be seen from above, we have assumed that the select privileges are present and you would be able to join the tables easily then.

Now, if we have to include (c) scenario as well and assuming that we have select privileges in place already on that object, we can either create views/synonyms for those objects in our schema or simply refer to those objects if the permissions are available using a three part naming convention:

<database_name>.<schema_name>.<object_name>

So, the above SQL then gets modified to:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the three sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
INNER JOIN Accounting.Emp.EMP_EXPENSE_RPT as EER ON EM.EMP_MASTER_ID = EER.EMP_MASTER_ID

WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

Now, that leaves us with the last part – (d).  Since it resides in a database on a separate instance, we can first create a linked server to it first and then either reference it using a couple of ways as was shown in that blog post on linked server.  Or we can simply create a synonym for it and use it.  Assuming we are using a four part naming convention to reference it, the above SQL now becomes:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the four sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
INNER JOIN Accounting.Emp.EMP_EXPENSE_RPT as EER ON EM.EMP_MASTER_ID = EER.EMP_MASTER_ID
INNER JOIN LinkedSrvHR.HR.Benefit.VW_EMP_PTO as VWP ON EM.EMP_MASTER_ID = VWP.EMP_MASTER_ID

WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

This should cover all the scenarios that you can possibly run into.  You can also try this with MERGE command.  Any transformations like look-up of values that you can do in T-SQL, you can do over the record set that you get using the above SQL.  It is essentially now a data-set for you that is coming from 4 different sources.  What you do with that data-set is entirely up to you – use a CASE statement over it, do aggregation over it, dump it into a temporary table/table variable for further processing etc. – all the normal T-SQL operations you would be able to do including inserting it into the summary table.

In addition to the technique from above, if this is something that you need to do as a quick and dirty thing and you do not intend to keep the new table around post the analysis, you can also do:

select … into <new table> from <old table or set of tables from one or multiple sources> where <the where condition over here>

The “select into” is the same as CTAS in Oracle.  You can read more on that here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: