Systems Engineering and RDBMS

Archive for the ‘DB2 LUW’ Category

New way of paging in SQL Server 2012 – Offset clause

Posted by decipherinfosys on March 21, 2012

For any web application, paging of records is a necessary requirement.  We have written several posts in the past on how to achieve paging in Oracle, SQL Server and DB2 LUW.  Those posts showed how paging can be achieved using CTE/custom procedural code/row_number() function (SQL Server), using rownum (Oracle) or rownumber()/fetch first clause (DB2).  Here are some of those posts to familiarize yourself with the different ways of paging in different RDBMS:

Whitepaper on paging in Oracle, SQL Server and DB2 LUW (bit dated – wrote it in 2007): https://decipherinfosys.wordpress.com/2007/01/18/paging-data-in-oracle-ms-sql-server-and-db2-luw-using-database-code/

Paging of Record Sets and COUNT(STOPKEY) Optimization in Oracle: https://decipherinfosys.wordpress.com/2007/08/09/paging-and-countstopkey-optimization/

Paging of record sets in SQL Server 2005 or SQL Server 2008: https://decipherinfosys.wordpress.com/2007/08/10/paging-of-record-sets-in-sql-server-2005/

Dynamic SQL, Dynamic Ordering and Paging: https://decipherinfosys.wordpress.com/2008/04/22/dynamic-sql-dynamic-ordering-and-paging/

In today’s post, we will introduce a new clause that is now available in SQL Server 2012 which makes writing the paging logic a lot more simpler to read and implement.  Here is an example:


DECLARE
@NbrofRecords INT = 15,
@PageNbr INT = 3,
@OrgTypeID INT = 23,
@OrgName NVARCHAR(100) = NULL;


SELECT OM.ORG_MASTER_ID, OMTX.ORG_TYPE_LKUP_ID, OTL.ORG_TYPE_NAME
, COALESCE (MFFM.FUND_FAMILY_NAME, OM.ORG_NAME) AS ORG_NAME, MFFM.MUTUAL_FUND_FAMILY_MASTER_ID
, COALESCE (CM.FIRST_NAME, ' ') + COALESCE (CM.MIDDLE_NAME, ' ') + COALESCE (CM.LAST_NAME, ' ') AS CONTACT_NAME
, CM.CONTACT_MASTER_ID FROM dbo.ORG_MASTER AS OM
INNER JOIN dbo.ORG_MASTER_TYPE_XREF AS OMTX
ON OM.ORG_MASTER_ID = OMTX.ORG_MASTER_ID
INNER JOIN dbo.ORG_TYPE_LKUP AS OTL
ON OMTX.ORG_TYPE_LKUP_ID = OTL.ORG_TYPE_LKUP_ID
LEFT OUTER JOIN dbo.ORG_TYPE_CONTACT_XREF AS OTCX
ON OMTX.ORG_MASTER_ID = OTCX.ORG_MASTER_ID
AND OMTX.ORG_TYPE_LKUP_ID = OTCX.ORG_TYPE_LKUP_ID
LEFT OUTER JOIN dbo.CONTACT_MASTER AS CM
ON OTCX.CONTACT_MASTER_ID = CM.CONTACT_MASTER_ID
LEFT OUTER JOIN dbo.MUTUAL_FUND_FAMILY_MASTER AS MFFM
ON OM.ORG_MASTER_ID = MFFM.ORG_MASTER_ID
WHERE OM.MARK_FOR_DELETION = 0
AND OTL.ORG_TYPE_LKUP_ID = @OrgTypeID
AND (OM.ORG_NAME LIKE @OrgName OR MFFM.FUND_FAMILY_NAME LIKE @OrgName OR @OrgName IS NULL)
ORDER BY OM.ORG_NAME ASC
OFFSET (@NbrofRecords * (@PageNbr - 1)) ROWS
FETCH NEXT @NbrofRecords ROWS ONLY;
GO

As you can see from above, the only change was to append the OFFSET clause and the FETCH NEXT to your actual query and use two parameters – one to denote the Page Number that you want and the other to denote how many records do you want per page.  So, say your query returns 1000 records and your per page display of records requirement is 15 records per page and you want to display the data from the third page – in that scenario, @PageNbr will have a value of 3 and @NbrofRecords will have a value of 15 so the offset would be by 15 * (3-1) = 30 i.e. you are going to start from the 31st record using the OFFSET clause and then FETCH the NEXT 15 records.  Pretty easy to write, understand and implement, isn’t it?  Do remember that these clauses go along with the ORDER BY clause i.e. you cannot use these on unordered data (will get an error) – if you want data ordered in a particular way, you should be using order by anyways which is always the case for paging for web applications.

Will need to play around a bit to ascertain the performance impact in the case of complex queries with different columns (from different tables) in the order by clause or conditional ordering to see what the impact is between the old vs the new approach.  Will post the results on the blog when we do those tests.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

OVER clause – much needed enhancements

Posted by decipherinfosys on February 23, 2011

For those of us who work in both Oracle and SQL Server, it is always a natural reaction to look for something equivalent in the other RDBMS in our day to day work.  One of the areas in which Oracle has excelled far ahead of SQL Server has been it’s rich support for analytics.  OVER clause capabilities is just one of those things.  MVP Itzik has posted several suggestions/requests on connect for such enhancements – you can see them here.  It also has links from other readers as well which point to the rich functionality of Oracle and DB2 in that regard. You should also read this whitepaper from Itzik and Sujata on this topic.

As per this update on connect, these did not make it in SQL Server 2008 and might be included in the next release – Denali.

This came up again when a client who is responsible for releasing their product on Oracle as well as SQL Server needed to simulate some of that rich analytic function capability in SQL Server.  There are always workarounds available and we have covered some of them in our previous blog posts (example – for first_value(), last_value() workaround) but it is time that SQL Server had that support now.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

High Availability and Disaster Recovery

Posted by decipherinfosys on January 28, 2011

These two terms are often used very interchangeably.

High Availability typically refers to the solutions that use fault tolerance and /or load balancing in order to provide high availability of the applications.  HA is essentially the ability to continue operations when a component fails – could be a CPU, memory failure, disk failure, complete failure of a server etc..  With HA, there is usually no loss of service.  Clustering servers together is a form of HA, having redundant power supplies or redundant network controllers is a form of HA, having proper RAID arrays is a form of HA, having proper load balancing in place is a form of HA.  The primary goal of a HA environment is uptime/providing continuous service w/o disruption.

Disaster Recovery provides increased availability.  It is the process/ability to restore operations that are critical for the business to run after a disaster (human induced or natural causes example – power failure in the production site, floods, earthquake, hurricane etc.).  The key difference between DR and HA is that the recovery time taken in the case of DR is typically more than a HA.  With DR, there is a small loss of service while the DR systems are activated and take over the load in the DR site.

Here are some posts/articles which delve into these differences in more detail:

http://www.channeldb2.com/profiles/blogs/disaster-recovery-high

http://nexus.realtimepublishers.com/sgudb.php

http://www.drj.com/2010-articles/online-exclusive/understanding-high-availability-and-disaster-recovery-in-your-overall-recovery-strategy.html

http://gestaltit.com/all/tech/storage/bas/define-high-availability-disaster-recovery/

Posted in DB2 LUW, Oracle, SQL Server, Technology, Unix | 1 Comment »

Insert data in one table with data from other table(s)

Posted by decipherinfosys on January 27, 2011

This post is in response to this question from a reader:

“Your post on updating data in one table with data from another table was very helpful.  I need to also do an insert into a table (a summary table) with data from a bunch of other tables which are in the same database and also in database(s) on other instances.  Right now, I have a very convoluted process of scheduled jobs, row by row scripts to help me do that.  There is some transformation (mostly lookups) involved as well.  Is there an easier/simpler way to do this?  Appreciate your thoughts on this”

The post that the question above referred to was this one – here.  In that post, we had covered how you can update the data in a table with the data from another table.  Doing an insert into a table with data from other tables (within the same schema or separate schemas or separate databases or separate databases on separate instances) is also fairly simple.  There are many ways of doing it – we will cover the “INSERT INTO …. SELECT FROM … ” approach in this blog post.

Let’s consider this hypothetical scenario to illustrate this (using SQL Server lingo though the same approach is available in Oracle and DB2 as well) – Say the summary table “tblEmpSummary” that this reader wanted to insert the data into resides in a database called EXECUTIVE in a schema called ANALYSIS.  It contains summary information on the employees – their name, their department, their performance review rating, their expense reports per week and the weekly benefit expenses for those employees.

And the source of the data are these:

a) Other table called EMP_PERF_REVIEW in the same schema in the same database (contains EMP_PERF_RATING column that we want and can join on EMP_MASTER_ID column),

b) Other table(s) called EMP_MASTER, EMP_DEPT_XREF and DEPT_MASTER from the schema EMPLOYEE on the same database (contains FIRST_NAME, LAST_NAME, DEPT_NAME information and we can join on EMP_MASTER_ID and DEPT_MASTER_ID),

c) Other table called EMP_EXPENSE_RPT from the EMP Schema in the ACCOUNTING database in the same SQL Server instance (contains WEEK_ENDING_DATE and TOTAL_AMOUNT fields that we are interested in and can join in using EMP_MASTER_ID column), and

d) A view called VW_EMP_PTO from a schema called BENEFIT from the HR database that resides on another instance (Contains PTO_ACCRUED and TOTAL_AMOUNT field for the benefits that we are interested in and can join in using EMP_MASTER_ID).

So, we have data coming from:

a) table in the same schema/same database,
b) tables in the same database but a different schema,
c) tables in a separate database on the same SQL Server instance, and
d) a view from a database in a separate SQL Server instance all together.

If suppose all of our data was coming from (a) alone, all we would have needed to do would be this:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from one source>
FROM Analysis.EMP_PERF_REVIEW
WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

Please do note that the WHERE clause above can contain a NOT EXISTS check on the tblEmpSummary to ensure that no dupes are being inserted – needless to state, this INSERT code would be wrapped in a TRY…CATCH block with proper error handling.

If we now extend it to include the data from (b) as well, the above SQL would change to:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the two sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

As can be seen from above, we have assumed that the select privileges are present and you would be able to join the tables easily then.

Now, if we have to include (c) scenario as well and assuming that we have select privileges in place already on that object, we can either create views/synonyms for those objects in our schema or simply refer to those objects if the permissions are available using a three part naming convention:

<database_name>.<schema_name>.<object_name>

So, the above SQL then gets modified to:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the three sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
INNER JOIN Accounting.Emp.EMP_EXPENSE_RPT as EER ON EM.EMP_MASTER_ID = EER.EMP_MASTER_ID

WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

Now, that leaves us with the last part – (d).  Since it resides in a database on a separate instance, we can first create a linked server to it first and then either reference it using a couple of ways as was shown in that blog post on linked server.  Or we can simply create a synonym for it and use it.  Assuming we are using a four part naming convention to reference it, the above SQL now becomes:

INSERT INTO Analysis.tblEmpSummary (<list of column names here>)
SELECT <list of the column names from the four sources – use the proper aliases>
FROM Analysis.EMP_PERF_REVIEW as EPR
INNER JOIN Employee.EMP_MASTER as EM ON EPR.EMP_MASTER_ID = EM.EMP_MASTER_ID
INNER JOIN Employee.EMP_DEPT_XREF as EDX ON EM.EMP_MASTER_ID = EDX.EMP_MASTER_ID
INNER JOIN Employee.DEPT_MASTER as DM ON EDX.DEPT_MASTER_ID = DM.DEPT_MASTER_ID
INNER JOIN Accounting.Emp.EMP_EXPENSE_RPT as EER ON EM.EMP_MASTER_ID = EER.EMP_MASTER_ID
INNER JOIN LinkedSrvHR.HR.Benefit.VW_EMP_PTO as VWP ON EM.EMP_MASTER_ID = VWP.EMP_MASTER_ID

WHERE <filter conditions>
<any other conditions like GROUP BY, ORDER BY etc. that we need>

This should cover all the scenarios that you can possibly run into.  You can also try this with MERGE command.  Any transformations like look-up of values that you can do in T-SQL, you can do over the record set that you get using the above SQL.  It is essentially now a data-set for you that is coming from 4 different sources.  What you do with that data-set is entirely up to you – use a CASE statement over it, do aggregation over it, dump it into a temporary table/table variable for further processing etc. – all the normal T-SQL operations you would be able to do including inserting it into the summary table.

In addition to the technique from above, if this is something that you need to do as a quick and dirty thing and you do not intend to keep the new table around post the analysis, you can also do:

select … into <new table> from <old table or set of tables from one or multiple sources> where <the where condition over here>

The “select into” is the same as CTAS in Oracle.  You can read more on that here.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Do not use “Select *”

Posted by decipherinfosys on June 12, 2009

How many times have you seen code where a “select *” is being used?  Chances are that if you have been in the industry for some time, you must have seen code that has a “select *” in it (either application code or DB code).  It is not a good choice for a variety of reasons –

Reason #1: Performance:

Consider these 2 queries:

Query 1: select * from tableA where colx = 5

vs say using

Query 2: Select Col1, Col2 from tableA where colx = 5

If there was a covered index on col5, col1 and col2, then in the case of the second query, it can just retrieve all the data from the index itself while the select * query will need to go to the data pages as well.  Since the intention is only to return Col1 and Col2, specifically mention which columns you need.

In addition to this performance issue, the un-necessary data transfer is another performance issue.  Just get the data that you need…nothing more, nothing less.

Reason #2: Plain simple code maintenance issue:

Say you have inherited a stored procedure which defines a table variable or a temp table (SQL Server lingo) and then does an “insert into #temp select * from…”.  If you add a new column that changes that then makes that select * to return an extra column, then this insert statement will fail.  In this case also, be explicit – mention which columns you are going to be inserting and where those columns are coming from.

Reason # 3: When using it in an EXISTS/NOT EXISTS condition:

Read Conor Cunningham’s post here.  You will see that when using an EXISTS/NOT EXISTS  condition, there is an advantage to use a “select 1” vs using a “select *”.  The reasons are well explained in Connor’s post.

Reason # 4: Column Dependency in Oracle 11g:

In Oracle 11g, the dependency mechanism has been extended down to the column level…so, what does that mean in relation to this post?  This means that if you are using explicit column names in your code, then the chances of the code getting invalidated and recompiled is less.

So, run through your code as a sanity check and see where all you have a “select *” in the code and then work on fixing it.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Supporting a higher version of the RDBMS

Posted by decipherinfosys on May 18, 2009

We have seen this happen too many times – a vendor would state in their presentation or marketing material that their product “supports” a newer version of the RDBMS.  A lot of times we have seen that the vendor just makes sure that their product does not break when running on a newer version of the RDBMS.  They do not take advantage of any of the new features of that RDBMS but since their product’s regression test went through fine after making connection changes and any other trivial changes that they needed to make, it makes it’s way into the marketing material.

That is mis-representation of the truth.  One should be clear in their message whether they really support a newer version of the RDBMS or have they just made sure that their product will run on it but will not take advantage of any of the new features till they come up with a subsequent release.  One of our clients bought a product from a vendor based on that claim only to find out that all the DB code was still using features from 2 releases ago – because of backward compatibility, it was still working but that defeats the purpose.  A proper due diligence should be done if you want to ensure that you get what you want else you might still end up buying and then implementing & supporting a product built on legacy codebase.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Too many sub-selects

Posted by decipherinfosys on April 28, 2009

While tuning some of the reports at a client site, we noticed that almost all of their reports were using too many sub-selects within their queries in the select part of the queries i.e. select statements within a select statement, example:

select col1, col2,
(select max(col3) from x where x.col4 = t.col1),
(select avg(col5) from y where y.col3 = t.col1)

from t
where t.col7 < 90;

There were also too many derived tables (SQL Server lingo) – also called as inline views (Oracle)…this is when the select statement is a part of the from clause, example:

select col1, col2,
(select max(col3) from x where x.col4 = t.col1),
(select avg(col5) from x where col3 = t.col1)

from (select t.col1, t.col2, t2.col3 from t inner join t2 on t.col1 = t2.col5) as t
where t.col7 < 90;

Inline views/Derived tables have their usefulness and we will cover those in a future blog post.  Back to the sub-selects now…the scalar sub-selects can sometimes be ok to use – especially when the aim is to return top x number of records & the logic is pretty complex which warrants the need for a sub-select.  However, if the data set is large, then since these sub-selects get executed per row returned by the main query, these can be a real drain on the resources.  In a majority of the cases, usage of sub-selects is not an efficient way of writing the SQL queries.

We had a similar situation at the client site and were able to re-write those report SQLs by either:

a) Re-writing the query by making use of joins rather than sub-selects, or
b) Making use of analytic functions to by-pass the reason why the complex sub-selects were written, or
c) By using CTE or by making use of the derived tables.

Resources:

  • Great discussion on Tom Kyte’s site on the same topic – here.
  • Another example – here.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

UnIndexed Foreign Keys – II

Posted by decipherinfosys on December 3, 2008

In some of our previous posts, we have covered the issues that unindexed foreign keys can cause. You can read these posts to get more information on those:

  1. Back to the Basics: Foreign Keys
  2. Un-Indexed Foreign Keys
  3. Blocking on Foreign Keys

A reader asked us whether it is always required to index the FKs or are there some guidelines that can be given in order to not do it under certain scenarios? There are some scenarios under which you do not need to have indexes on the FKs:

a) When you do not delete records from the parent table especially when you do not have the “ON DELETE CASCADE” rule set.

b) When you do not join from the parent table to the child table.

c) When you do not update the unique key (primary key in most cases) in the parent table. This is not done mostly since it would mean a bad selection of the Primary Key column.

d) And if you never query that table directly based on that FK column(s).

e) The child table is a small table and is used only for lookups.

So, if all these conditions above are met in your environment, then there is no need to index those FK columns but else it is usually a good idea to index your foreign keys.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Using DISTINCT … just because

Posted by decipherinfosys on November 21, 2008

We have blogged about the DISTINCT clause in the past (common mis-conceptions about the DISTINCT clause) and what it does and how it gets mis-used at times. This is one of the common issues that we have seen in some of the queries written by junior folks or even at times by senior folks not well versed with SQL.  They add the DISTINCT clause to the Select list just to make sure that “in case” there are duplicates, it will all be taken care of by this wonderful clause.  Most of the time, the reason why they get duplicates is because of join conditions or because they did not write the query in different ways like for example: If you have a 3 table join and say tableA : tableB :: 1:N (1 to many relationship) and tableB:tableC::1:N (again a 1 to many relationship) and the 3 joins are together but one is only selecting the data from tableA and tableB and that set of columns provides unique set of data, however the developer also adds tableC in the join because there is a filter condition (where clause) on it.  That would result into duplicates when there are many records being returned from tableC for a record in tableB because of the 1:N relationship.  In that case, the developer ends up adding a DISTINCT clause in the select list to get the distinct set of data.  The developer instead could have chosen to just use an EXISTS clause like this:

select …<columns from tableA and tableB>…

from tableA

inner join table B

on tableA.col1 = tableB.col2

where …<criteria on tableA and tableB – if being used based on the business logic>…

and exists (select 1 from tableC where col3 = tableB.col4)

and this will negate the usage of the DISTINCT clause.  There are many other such scenarios as well.  The problem with using DISTINCT un-necessarily is that the code will then incur sorting costs un-necessarily since the DISTINCT clause requires that the resultset is sorted and the duplicates are eliminated from the result set.

We have seen even bigger issues when people start using distinct aggregates even when they are not needed or when there are better ways to write the query…by distinct aggregates, we mean, clauses like:

count(distinct invoice_nbr) or sum(distinct sales_qty) etc.

The problem exacerbates when there are a lot of mixing of such distinct aggregates with non-distinct aggregates, example:

select
count(distinct invoice_nbr),
sum(distinct sales_qty),
count(ship_date),
Avg(sales_price),

…etc.

Mixing of these distinct aggregates and non distinct aggregates in the same SQL statement’s select list leads to spooling and involves several re-reads of the intermediate results which is very expensive – even more expensive than computing those separately in separate queries.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Back to the Basics: Understanding the BETWEEN Operator

Posted by decipherinfosys on November 7, 2008

At one of the client sites, a junior developer asked this question which is worth posting over here.  The question was pertaining to the BETWEEN operator.  As you know, this operator is used to test the existence of the values between two expressions.  Here is the entry from BOL for the syntax:

test_expression [ NOT ] BETWEEN begin_expression AND end_expression

The key to understanding is this section (from BOL):

BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.

NOT BETWEEN returns TRUE if the value of test_expression is less than the value of begin_expression or greater than the value of end_expression.

What the developer was trying to do was – use the criteria in any order … so, if we have this query against the AdventureWorks database:

select * from Production.Product where SellStartDate between ‘2001-06-29’ and ‘2001-07-02’

It will return 72 records.  If the values are swapped:

select * from Production.Product where SellStartDate between ‘2001-07-02’ and ‘2001-06-29’

it will return zero records and the reason is that internally the query (as mentioned by BOL and the ANSI SQL Standards as well) gets translated to:

select * from Production.Product where SellStartDate >= @first_value and SellStartDate <= @last_value

So, in the event of the second query, you will never find an intersection set of the data.  Please do note that the equality ( = ) is also included when you use the BETWEEN clause.

It would have been perfectly fine if one wanted to avoid all the records where the dates were falling between those 2 ranges – in that case, one then needs to use the “NOT BETWEEN” operator:

select * from Production.Product where SellStartDate not between ‘2001-07-02’ and ‘2001-06-29’

Which returns 504 records.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »