Systems Engineering and RDBMS

Archive for the 'Oracle' Category


DBMS_ROWID

Posted by decipherinfosys on August 20, 2008

DBMS_ROWID is one of the supplied PL/SQL packages, which can be used to manipulate and retrieve rowid information. Rowid is one of the most famous pseudo columns along with the rownum. Rowid uniquely identifies the row in the table. Rowid can be used as data type as well. We will cover rowid/urowid and its usage in different blog post, but in this blog post, we will talk about DBMS_ROWID package and some of its useful subprograms. Using these subprograms, we can retrieve object number, file number, block number and row number information we are interested in. Let us first create table and populate it with some data.

CREATE TABLE TEST
(
 ID            NUMBER(9),
 OBJ_TYPE   VARCHAR(5),
 OBJ_NAME     VARCHAR(30),
 CREATE_DATE  DATE
);

INSERT INTO TEST(id,obj_Type,obj_name,create_Date)
SELECT object_id,object_Type,object_name,sysdate
  FROM user_objects
 WHERE object_Type = ‘TABLE’
   AND ROWNUM <= 5;

Commit;

Now let us retrieve rowid for the inserted rows.

SELECT ROWID, id,
       dbms_rowid.rowid_type(ROWID) rowid_type,
       dbms_rowid.rowid_object(ROWID) object_no
  FROM test;

ROWID                      ID ROWID_TYPE  OBJECT_NO
------------------ ---------- ---------- ----------
AABHPOABWAAAxUAAAA     242439          1     291790
AABHPOABWAAAxUAAAB     242441          1     291790
AABHPOABWAAAxUAAAC     242443          1     291790
AABHPOABWAAAxUAAAD     242445          1     291790
AABHPOABWAAAxUAAAE     242447          1     291790

 

This is not usual ROWID format. We know that ROWID has block.row.file format (BBBBBBB.RRRR.FFFFF).  This is extended ROWID format. In above sql, we have used rowid_type function to know the ROWID type. They are of two types. 1 indicates extended rowid (Oracle 8i and higher) and 0 indicated restricted rowid (Oracle 7 and less). One can convert extended rowids to restricted rowid (block.row.file) format.

Let us execute following sql to obtain restricted and extended rowids both.
SELECT id, dbms_rowid.rowid_to_restricted(ROWID,0) RESTRICTED ,
       dbms_rowid.rowid_to_extended
    ((dbms_rowid.rowid_to_restricted(ROWID,0)),
     ‘SCOTT’,'TEST’,1) extended
  FROM test;

        ID RESTRICTED         EXTENDED
---------- ------------------ ------------------
    242439 00031500.0000.0056 AABHPOABWAAAxUAAAA
    242441 00031500.0001.0056 AABHPOABWAAAxUAAAB
    242443 00031500.0002.0056 AABHPOABWAAAxUAAAC
    242445 00031500.0003.0056 AABHPOABWAAAxUAAAD
    242447 00031500.0004.0056 AABHPOABWAAAxUAAAE

 

Other useful function is dbms_rowid.rowid_verify, which verifies the validity of restricted rowid to make sure that it can be converted to extended rowid. It returns 0, if conversion is possible otherwise returns 1.

Three other important functions are to obtain the block number, row number and absolute file number from the rowid. Following is the sql followed by data.

SELECT dbms_rowid.rowid_block_number(ROWID) Block_No,
       dbms_rowid.rowid_row_number(ROWID) Row_No,
       dbms_rowid.rowid_to_absolute_fno(ROWID,’SCOTT’,'TEST’) Abs_File_No
  FROM TEST;

  BLOCK_NO     ROW_NO ABS_FILE_NO
---------- ---------- -----------
    201984          0          86
    201984          1          86
    201984          2          86
    201984          3          86
    201984          4          86

 

Similarly we can get the relative file number as well using dbms_rowid.rowid_relative_fno function. It takes two arguments rowid and tablespace_type it belongs to (bigfile or smallfile).  ROWID_INFO is the only procedure of dbms_rowid package, which gives all of the above information in single call.

Posted in Oracle | No Comments »

Back to the basics: NULL values and Aggregate functions

Posted by decipherinfosys on August 18, 2008

One common mistake for people new to SQL is remembering how the NULL values get treated when using aggregate functions.  Here is an example that helps illustrate the point:

declare @t table (col1 int)
insert into @t values (null)
insert into @t values (2)
insert into @t values (4)
select
AVG(col1) as Col_Avg,
COUNT(col1) as Col_Count,
COUNT(*) as Total_Count,
MAX(col1) as MAX_value,
MIN (col1) as MIN_value,
SUM(col1) as Total
from @t

Col_Avg     Col_Count   Total_Count MAX_value   MIN_value   Total
----------- ----------- ----------- ----------- ----------- -----------
3           2           3           4           2           6

Warning: Null value is eliminated by an aggregate or other SET operation.

As you can see from the Warning, the engine immediately tells you that the Null value was eliminated by an aggregate operation.  If you see the first column, you will see that the average is 3 and not 2 since it eliminated the row with the NULL value in it.  Likewise, when we did a COUNT(col1) vs a COUNT(*), there is a difference in the count - this again is because the NULL value has been eliminated by the aggregate function.

These are very important things to remember else you can end up writing code which will give out wrong results.  One of the ways to not fall into such issues is to address these issues right at the design time and if you have a numeric column, then define that as a not null column with a default value of 0.  That way, in the absence of the data, you can treat that as a 0 value and then the average would return the correct/intended data set - likewise for the count(col1) when the col1 is not null.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Back to the Basics: Using variable in a SELECT statement

Posted by decipherinfosys on August 12, 2008

Last week at a client site, one of the junior team members had asked whether we can use a variable in the SELECT statement or not. Requirement was to insert value in the new table from an existing table but certain columns in the new table will be populated with the variable. We will show a simple example to demonstrate this. First let us create a  table. Connect to SQL*Plus using proper credentials and create the following table.

CREATE TABLE TEST
(
ID                  NUMBER(9),
OBJ_TYPE         VARCHAR(5),
OBJ_NAME        VARCHAR(30),
CREATE_DATE     DATE
);

In the following anonymous PL/SQL block, we will use variable in INSERT INTO SELECT statement. We are aware that we can use pseudo columns like rownum, sysdate etc. in the SELECT statement. Similarly we can use variable declared in the PL/SQL block or any program unit in SELECT statement.  Execute following anonymous PL/SQL block from SQL*Plus session.

SET SERVEROUTPUT ON

DECLARE
v_objType VARCHAR2(10) := ‘TABLE’;
v_curDate DATE := SYSDATE;

BEGIN

INSERT INTO TEST(id,obj_Type,obj_name,create_Date)
SELECT object_id,v_objType,object_name,v_curdate
FROM user_objects
WHERE object_Type = ‘TABLE’
AND ROWNUM <= 10;

dbms_output.put_line(’Rows Inserted = ‘ || SQL%ROWCOUNT);
COMMIT;

EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line(SQLERRM);
ROLLBACK;
END;
/

Output indicates that variable usage in SELECT statement work correctly.

Rows Inserted = 10

PL/SQL procedure successfully completed.

Now let us see what happens, when we declare variable with the same name as of the column name from the table. We are using user_objects table in select statement and object_type is one of the column of user_objects table. In our PL/SQL block, we will use object_type as a local variable. Please execute following modified anonymous block and check the final results. No way, we are recommending that you should use declare variable similar to column name in the table and it is absolutely not the desirable coding practice.

To identify different data set, this time we are selecting ‘VIEW’ objects rather than the ‘TABLE’ objects.

DECLARE
object_Type VARCHAR2(19) := ‘TABLE’;
v_curDate DATE := SYSDATE;

BEGIN

INSERT INTO TEST(id,obj_Type,obj_name,create_Date)
SELECT object_id,object_type,object_name,v_curdate
FROM user_objects
WHERE object_Type = ‘VIEW’
AND ROWNUM <= 10;

dbms_output.put_line(’Rows Inserted = ‘ || SQL%ROWCOUNT);
COMMIT;

EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line(SQLERRM);
ROLLBACK;
END;
/

PL/SQL will also complete this time without any errors but let us check the result set. Result indicates that value from the table took precedence over the variable declared with the same name.

This type of declaration has also other consequences. In where clause, if we try to get the ‘PROCEDURE’ instead of ‘VIEW’ it will give following error. This is one more reason to avoid such declaration.

ORA-12899: value too large for column “CNFGPKMS”.”TEST”.”OBJ_TYPE” (actual: 9, maximum: 5)

Posted in Oracle | No Comments »

Oracle VM Templates

Posted by decipherinfosys on August 11, 2008

Saw a press release from Oracle on their web-site about how they have released Oracle VM Templates to help with server consolidation. Pre-configured and pre-installed images of the software makes it easy to roll out development and qa environments. You can read more on it at Oracle’s press release here.

Posted in Oracle | No Comments »

Back to the Basics: A Simple Join Issue

Posted by decipherinfosys on August 11, 2008

We have covered join basics before and one of the questions that all beginners face is that if they are joining TableA with TableB in an outer join condition BUT they need to join TableC with TableB in an inner join condition, then they lose the outer join effect since for records in TableB that do not match TableA, it will result into NULL records after the join and since NULL != NULL in SQL (not considering the ANSI_NULL setting in SQL Server for this post), then the inner join with TableC will get rid of those records. Let’s take this up with an example (using SQL Server Syntax though the same applies to Oracle of DB2 LUW as well):

create table TableA (col1 int primary key, col2 nvarchar(10));
create table TableB (col3 int primary key, col1 int, col4 nvarchar(10), constraint fk_b FOREIGN KEY (col1) references TableA(col1));
create table TableC (col5 int primary key, col3 int, col6 nvarchar(10), constraint fk_c FOREIGN KEY (col3) references TableB(col3));

set nocount on
go

insert into TableA values (1, ‘A-1 value’);
insert into TableA values (2, ‘A-2 value’);

insert into TableB values (1, 1, ‘B-1 value’);

insert into TableC values (1, 1, ‘C-1 value’);
insert into TableC values (2, 1, ‘C-2 value’);
go

Now, if we want to get all records in A but only those records in B where there is a match, the query is a simple left outer join between A and B:

select *
from dbo.TableA as A
Left Outer Join dbo.TableB as B
on A.col1 = B.col1

And the result will be:

col1        col2       col3        col1        col4
----------- ---------- ----------- ----------- ----------
1           A-1 value  1           1           B-1 value
2           A-2 value  NULL        NULL        NULL

And now, if we want to add an inner join condition between TableB and TableC to restrict getting only those records from TableB where there is a match in TableC, one would be tempted to do just this:

select *
from dbo.TableA as A
Left Outer Join dbo.TableB as B
on A.col1 = B.col1
Inner join dbo.TableC as C
on B.col3 = C.col3

But as you can see from the result:

col1        col2       col3        col1        col4       col5        col3        col6
----------- ---------- ----------- ----------- ---------- ----------- ----------- ----------
1           A-1 value  1           1           B-1 value  1           1           C-1 value
1           A-1 value  1           1           B-1 value  2           1           C-2 value

The second record from TableA is not present in the result anymore since the inner join negated that Null value. In order to get the desired result, one can make use of what are called as derived tables (SQL Server lingo) or inline views (Oracle lingo) and re-write the query as:

select *
from dbo.TableA as A
Left Outer Join (dbo.TableB as B
Inner join dbo.TableC as C
on B.col3 = C.col3)
on A.col1 = B.col1

or

select *
from dbo.TableA as A
Left Outer Join (select B.col1, B.col3, B.col4, C.col5, C.col6 from dbo.TableB as B
Inner join dbo.TableC as C
on B.col3 = C.col3) as X
on A.col1 = X.col1
and the result will be:

col1        col2       col1        col3        col4       col5        col6
----------- ---------- ----------- ----------- ---------- ----------- ----------
1           A-1 value  1           1           B-1 value  1           C-1 value
1           A-1 value  1           1           B-1 value  2           C-2 value
2           A-2 value  NULL        NULL        NULL       NULL        NULL

And in this particular case, one can even do this:

select *
from dbo.TableB as b
inner join dbo.tableC as c
on b.col3 = c.col3
right outer join dbo.tableA as a
on b.col1 = a.col1

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Foreign Key Issue and resolutions

Posted by decipherinfosys on August 8, 2008

In one of the applications at the client site, the application had a design which allowed for bad data being entered into the system. The issue was fixed by fixing and putting a check in the application which writes to the database but at the database tier itself no such checks were present and as a result the data that was coming through the feeds also ended up corrupting the data in the end.

The situation was like this: There were two tables: COMPANY and COMPANY_SECURITY. Both had a surrogate key defined as an auto-incremental ID value. COMPANY table had an alternate key defined on CUSIP - in the Wall Street terms, a CUSIP is unique to every company so this was a NOT NULL UNIQUE column. The child table (COMPANY_SECURITY) had the foreign key established using the COMPANY_ID surrogate key. The alternate key of this table was SECURITY_CUSIP. A company whose stock is traded at the different exchanges like NASDAQ, NYSE etc. can have more than one security offering and the securities are unique. The COMPANY.CUSIP is 6 characters long and the COMPANY_SECURITY.SECURITY_CUSIP is 9 characters long with the first 6 digits being the same as the company cusip and the last 3 characters defining the security (2 defining it and the last being a checksum).

So, based on that understanding, say if the COMPANY.CUSIP for a company like Microsoft is 123456, then it’s securities (COMPANY_SECURITY.SECURITY_CUSIP) should be 123456xyz, 123456abc etc.. But there are no checks at the database level to make sure that the first 6 digits match between the security_cusip and the parent table’s cusip since the FK is established based on the surrogate key.

There are a couple of ways to fix this at the database tier:

1) Creation of a new object: Use a trigger to prevent the data entry into the child table in the event of bad data getting entered. This can create performance issues in the event of a highly transactional system and is a poor way of enforcing this kind of a constraint.

2) Change the table structure design: Change the Primary Key of the parent table to be on COMPANY.CUSIP i.e. this would be a scenario of making use of the natural key since CUSIP does have a business meaning - it never changes once it has been established unless there are M&A (Merger and Acquisition) activities in which case the older record is marked for deletion and a new one gets created to represent the merger/acquisition. And a lot of searches are done based on this column itself which is always of the same length. That way, the child table will also inherit the CUSIP column since we would be using it for the foreign key enforcement and then store only the three characters for the security_cusip instead of storing all 9. So, the child table will end up with CUSIP (6 characters) and SECURITY_CUSIP (3 characters) and that combination will make it unique. For the reads, there can be either of the two strategies:

a) Based on the concatenation of the values from the same table - do understand though that when operations are done on indexed columns, the index seek operations do not happen. One can use an indexed view though to do that and do the reads off of that.

b) Split the input parameter into two - one of 6 characters and the other of 3 and then by joining the parent and the child table, put the right parameter in the right where clause condition. This approach has the benefit of using the indexes since no operations are being done on the indexed columns.

3) Introduce new column: Same as #2 but keep the Primary key as such and introduce redundancy and put the CUSIP column in the child table as well and introduce the FK between that column and the parent’s CUSIP column. A FK can defined on a UNIQUE NOT NULL column as well - does not necessarily have to to be a Primary Key.

4) Add a new Check Constraint with a UDF: Keep the same schema and use a CHECK constraint to define the constraint across the two tables. Please do note that the ANSI SQL Standards do allow the user of sub-queries in check constraints but not all RDBMS follow it. What you can do though is that you can define a UDF (User Defined Function) that performs an existence check against the parent table and returns a 1 or a 0 and define the check constraint based on that. For example:

/*********************

Table Definitions

**********************/

create table company
(
company_id int identity(1,1) not null,
company_name nvarchar(20) not null,
ticker nvarchar(10) not null,
cusip nvarchar(6) not null
, constraint pk_company primary key (company_id));

create unique index company_ind_1 on dbo.company (cusip);

create table company_security
(
company_security_id int identity(1,1) not null,
company_id int not null,
security_cusip nvarchar(9) not null,
company_cusip as substring(security_cusip, 1, 6)
, constraint pk_company_security primary key (company_security_id)
, constraint fk_company foreign key (company_id) references dbo.company(company_id));

create unique index company_security_ind_1 on dbo.company_security (security_cusip);

/********************

Define the UDF

*********************/
CREATE FUNCTION dbo.udf_check_cusip
(@company_id int, @security_cusip nvarchar(9))
RETURNS bit
AS
BEGIN
IF EXISTS(SELECT 1
FROM dbo.company
WHERE company_id = @company_id and cusip = substring(@security_cusip, 1, 6))
RETURN 1
RETURN 0
END
/************************************************************

Now, add the CHECK constraint using the above UDF

*************************************************************/

alter table dbo.company_security with check add constraint ck_cusip check (dbo.udf_check_cusip (company_id, security_cusip) = 1);

Let’s test it out now:

declare @i int
insert into dbo.company (company_name, ticker, cusip) values (N’Microsoft’, N’MSFT’, N’123456′);
select @i = SCOPE_IDENTITY()
select @i
insert into dbo.company_security (company_id, security_cusip) values (@i, N’123456123′);

This will go through fine.

Now, let’s enter data that violates the check constraint:

insert into dbo.company_security (company_id, security_cusip) values (1, N’100006123′);

And we will get this error:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint “ck_cusip”. The conflict occurred in database “AdventureWorks”, table “dbo.company_security”.
The statement has been terminated.

5) Change the table structure: Same as option #2 but in this case we do not change the PK. We keep the surrogate keys and in the child table, only use the three digit security_cusip. The unique index will then be the combination of the company_id and the three digits will make it unique. Kind of like having the fixed length field 6 digit CUSIP + 3 digit SECURITY_CUSIP as the Primary Key (i.e. a natural key selection like option #2 above). This would be fine for writes … when you do the reads - even for the ones which only need information from the company_security table (though such scenarios will be rare), you will need to join the company and the company_security table and present the data for security cusips by concatenating the company’s cusip and the security cusip. Same read scenarios as mentioned in #2 above apply.

So, as you can see from above, there are a lot of ways of addressing this issue and depending upon the state of the project and the impact to the application, you can chose your solution. Option #2 or 5 are the right design choices.

Note: This option was also tried by the client: Computed column. You cannot use a computed column to enforce the FK since computed columns cannot be used for defining FKs i.e. you cannot just introduce a computed column based on substring(security_cusip, 1, 6) and then try to establish a FK between it and the parent table’s CUSIP column.

Posted in DB2 LUW, Data Model, Oracle, SQL Server | No Comments »

Self Joins

Posted by decipherinfosys on August 7, 2008

One of the recent questions that was asked by a reader for her particular problem needed to include a self join. So, what is a self join? A self join is a join that is done between the same table by using aliases. Such joins are useful for displaying data when there is a relationship between different columns of the data in the same table OR when you wish to retrieve data in a single row based on comparisons between different rows in the same table. They are pretty useful in certain scenarios. We will cover some of those scenarios in this post to give you an idea on where and how you can make use of that information.

A classic example that is always given whenever one talks about self joins is that of an employee table in which there is a recursive 1:N relationship between the employee and their manager. Here is an example:

set nocount on
go
DECLARE @EMP_MASTER TABLE
(
EMP_NBR INT NOT NULL PRIMARY KEY,
EMP_NAME NVARCHAR(20),
MGR_NBR INT NULL
);

INSERT INTO @EMP_MASTER VALUES (1, ‘DON’, 5);
INSERT INTO @EMP_MASTER VALUES (2, ‘HARI’, 3);
INSERT INTO @EMP_MASTER VALUES (3, ‘RAMESH’, 5);
INSERT INTO @EMP_MASTER VALUES (4, ‘JOE’, 5);
INSERT INTO @EMP_MASTER VALUES (5, ‘DENNIS’, NULL);
INSERT INTO @EMP_MASTER VALUES (6, ‘NIMISH’, 7);
INSERT INTO @EMP_MASTER VALUES (7, ‘JESSIE’, 4);
INSERT INTO @EMP_MASTER VALUES (8, ‘KEN’, 2);
INSERT INTO @EMP_MASTER VALUES (9, ‘AMBER’, 5);
INSERT INTO @EMP_MASTER VALUES (10, ‘JIM’, 5);

select empl.EMP_NAME as employee_name, mgr.EMP_NAME as manager_name
from @EMP_MASTER as empl
inner join @EMP_MASTER as mgr
on empl.MGR_NBR = mgr.EMP_NBR

employee_name        manager_name
-------------------- --------------------
DON                  DENNIS
HARI                 RAMESH
RAMESH               DENNIS
JOE                  DENNIS
NIMISH               JESSIE
JESSIE               JOE
KEN                  HARI
AMBER                DENNIS
JIM                  DENNIS

Another example is that of a product line where you might want to build a report to see which products have the same name but differ in their packaging or aisle location where they are placed in the warehouse or a shop. Example:

declare @product table (product_name nvarchar(30), aisle nvarchar(10));
insert into @product values (’Starbucks-Coffee’, ‘A’);
insert into @product values (’Starbucks-Coffee’, ‘B’);
insert into @product values (’Earl Grey’, ‘A’);

select a.product_name, a.aisle as first_aisle, b.aisle as second_aisle
from @product as a
inner join @product as b
on a.product_name = b.product_name
where a.aisle < b.aisle

product_name                   first_aisle second_aisle
------------------------------ ----------- ------------
Starbucks-Coffee               A           B

The same logic can be applied in a multitude of situations - some of the situations that come to mind (besides the ones mentioned above are):

a) Schedule log - like a train station or an airline departure/arrival running log comparisons,

b) A Web server hit stats,

c) Geological data,

d) Stock values for NASDAQ, NYSE etc.

In all these, relative comparisons based on time are typically made when one does a self join.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Ordering of the data

Posted by decipherinfosys on August 4, 2008

This might seem like an obvious thing to a majority of our readers but since this question has come up many times in e-mails from some of our readers as well as at client sites, we thought it needs to be mentioned in here.  At a client site, one of the end users of the system had put in a bug report that when configuring data for an application, the same data was sometimes getting repeated on different pages.  For the application in question, the query for the page was made up of a single configuration table which had close to 1000 rows and using a paging mechanism, 25 records were being displayed per page from that table.

The query that was being used for that page did not have an order by clause.  Also, data was updated as well as new data was inserted into this table as more stores were added to this retail application.  As a result of this, when paging was done and since the SQL did not have an order by clause, at times, the same data appeared across the pages.  This was a web based stateless application.  When paging was done, each page fired off the same SQL but showed different sets of the data (first 25 records on the first page, 26-50 on the next and so on).  Since there was no ORDER BY clause in the SQL, you are not guaranteed to get the records in “exactly” the same order each and every time.  The developer of the page had made an assumption that since the query is based off a single table, the data will be presented in the same order as that of the primary key index.  That is not always true.  As updates are made to the data records or as deletes happen, the internal rowids can change and one cannot rely on the data being returned in the order of the primary key index.  Also, if there is a filter condition that is used by the end user, depending upon the SQL that gets formed, the execution plan generated could be by-passing the PK index and returning all the data from a separate index itself (a covered index).

Moreover, if tomorrow, there is a need to add more tables in this SQL query, then also this assumption will fail.  So, the bottom line is that if you want the records to be returned in a particular order, always use an ORDER BY clause.  In the case of the above application since the application was using an ORM (Object Relational Mapper) to build up the SQL dynamically and was using a configuration table to specify the default order by condition (in case none was specified by the end user), fixing it was a simple data entry change by putting the primary key column in the list.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Having vs Where Clause

Posted by decipherinfosys on July 29, 2008

Got another question from one of the readers asking about: “whether it is ok to substitute the WHERE clause with the HAVING clause since in the absence of the WHERE clause and a grouping condition, HAVING essentially acts as a WHERE clause onlY? What is the difference in using these two clauses?”

One thing to understand first is the basic function of the WHERE and the HAVING clauses - WHERE is used to apply filter conditions on the table columns and HAVING is used to apply the filters typically after the aggregations are done.  It is also important to understand the progression of the evaluation of the execution plan.  In the presence of both the WHERE and the HAVING clause, the WHERE condition will get evaluated prior to the HAVING condition.  Let’s take an example (using AdventureWorks sample user database in SQL Server 2005):

select Color, SUM(StandardCost) as SUM_STD_COST

from Production.Product

where Color IN (’Blue’, ‘Black’)

group by Color

And one can also re-write this to use HAVING instead of the WHERE clause:

select Color, SUM(StandardCost) as SUM_STD_COST

from Production.Product

group by Color

having Color IN (’Blue’, ‘Black’)

The second one evaluates the filter after the grouping and the first one applies the filter and then does the grouping and is more effective.  For this simple SQL statement, if you look at the execution plan, you will see that the optimizer is smart to change the second SQL’s execution plan to do a filter before the grouping since in the absence of the where clause,

From the first SQL:

————————————————————————-
|–Stream Aggregate(GROUP BY:([AdventureWorks].[Production].[Product].[Color]) DEFINE:([Expr1003]=SUM([AdventureWorks].[Production].[Product].[StandardCost])))
|–Sort(ORDER BY:([AdventureWorks].[Production].[Product].[Color] ASC))
|–Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]), WHERE:([AdventureWorks].[Production].[Product].[Color]=N’Black’ OR [AdventureWorks].[Production].[Product].[Color]=N’Blue’))

And for the second SQL (after flushing the cache):

————————————————————————————-
|–Stream Aggregate(GROUP BY:([AdventureWorks].[Production].[Product].[Color]) DEFINE:([Expr1003]=SUM([AdventureWorks].[Production].[Product].[StandardCost])))
|–Sort(ORDER BY:([AdventureWorks].[Production].[Product].[Color] ASC))
|–Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]), WHERE:([AdventureWorks].[Production].[Product].[Color]=N’Black’ OR [AdventureWorks].[Production].[Product].[Color]=N’Blue’))

So, the optimizer does the right thing in this case…however, in the case of complex queries, it would be an issue.  So, use the WHERE clause to filter off the records using indexed columns for good performance and use Having only for applying filters at the top of the aggregations.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

A reporting query

Posted by decipherinfosys on July 24, 2008

This was one of the questions that we got from a reader:

I am working on a HR application and need to prepare a report in which I need to show the data in this format:

DEPT_NBR MAX_HIRE_DATE    MIN(DT)    HIRE_DATE
-------- -------------    ---------  --------
0312     01-JUN-07        12-DEC-02  12-DEC-02
                                     31-MAR-04
                                     30-APR-05
                                     31-MAY-06
                                     01-JUN-07
0313     15-MAY-08        01-NOV-06  12-NOV-06
                                     31-MAR-07
                                     01-NOV-06
                                     31-MAY-07
                                     15-MAY-08

i.e. I want to show per Department, the max hire date, the min hire date and then the hire dates and other employee related information. Problem is that when I use aggregate functions, I cannot have other pieces of information that I need and if I use sub-queries (sub-selects), then the performance is very bad. Is there any way to do this in SQL? I know that this should probably be done on the reporting side rather than the database side since in reporting (SSRS, Crystal or others), one can just put a function to do the max/min over a group but that would also have an overhead so if I can do this in SQL itself, that will be good. Also, I need to be able to have the same SQL for both Oracle and SQL Server since we are a vendor company and our product is used by clients who could be running on either of these platforms.

The answer to this question is - Use Analytics and you can easily do this in SQL - and it can be done for both Oracle and SQL Server (version 2005 and above though). Here is the solution - let’s create the data first (Note to the readers - whenever possible, please do provide the scripts to re-create your scenarios - it helps us save time and get back to you faster with a solution):

SET NOCOUNT ON;
GO
CREATE TABLE dbo.EMP_MASTER (DEPT_NBR NVARCHAR(10), HIRE_DATE DATETIME, FIRST_NAME NVARCHAR(30), LAST_NAME NVARCHAR(30));
INSERT INTO dbo.EMP_MASTER VALUES (’0312′, ‘12-DEC-02′, ‘Joe’, ‘Snyder’);
INSERT INTO dbo.EMP_MASTER VALUES (’0312′, ‘31-MAR-04′, ‘David’, ‘Gilo’);
INSERT INTO dbo.EMP_MASTER VALUES (’0312′, ‘30-APR-05′, ‘Smitha’, ‘Reilly’);
INSERT INTO dbo.EMP_MASTER VALUES (’0312′, ‘31-MAY-06′, ‘Naveen’, ‘Andrews’);
INSERT INTO dbo.EMP_MASTER VALUES (’0312′, ‘01-JUN-07′, ‘Matthew’, ‘Winkle’);
INSERT INTO dbo.EMP_MASTER VALUES (’0313′, ‘12-NOV-06′, ‘Kathy’, ‘Konnor’);
INSERT INTO dbo.EMP_MASTER VALUES (’0313′, ‘31-MAR-07′, ‘Vikesh’, ‘Gupta’);
INSERT INTO dbo.EMP_MASTER VALUES (’0313′, ‘01-NOV-06′, ‘Martha’, ‘Stewart’);
INSERT INTO dbo.EMP_MASTER VALUES (’0313′, ‘31-MAY-07′, ‘Jim’, ‘Diego’);
INSERT INTO dbo.EMP_MASTER VALUES (’0313′, ‘15-MAY-08′, ‘Arthur’, ‘Doyle’);

And here is the code to get the data in the required format:

select
row_number() over (partition by e.dept_nbr order by e.hire_date) as RN,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then e.dept_nbr else ” end as dept_nbr,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), min(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as min_hire_date,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), max(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as max_hire_date,
e.hire_date
from dbo.emp_master as e
order by e.dept_nbr, e.hire_date;

RN                   dept_nbr   min_hire_date max_hire_date hire_date
-------------------- ---------- ------------- ------------- -----------------------
1                    0312       12/12/2002    06/01/2007    2002-12-12 00:00:00.000
2                                                           2004-03-31 00:00:00.000
3                                                           2005-04-30 00:00:00.000
4                                                           2006-05-31 00:00:00.000
5                                                           2007-06-01 00:00:00.000
1                    0313       11/01/2006    05/15/2008    2006-11-01 00:00:00.000
2                                                           2006-11-12 00:00:00.000
3                                                           2007-03-31 00:00:00.000
4                                                           2007-05-31 00:00:00.000
5                                                           2008-05-15 00:00:00.000

If you look at the code above, you will see that we are using analytic functions and we partition the data by department first and then order it by the hiring date within that department - we just look for the very first record and then do a min or a max in order to retrieve our values from that group. There are a couple of other ways to achieve the same as well. Above just demonstrates the usage of the analytic functions in making this pretty simple.

Posted in Oracle, SQL Server | No Comments »