Systems Engineering and RDBMS

Archive for July 24th, 2008

A reporting query

Posted by decipherinfosys on July 24, 2008

This was one of the questions that we got from a reader:

I am working on a HR application and need to prepare a report in which I need to show the data in this format:

DEPT_NBR MAX_HIRE_DATE    MIN(DT)    HIRE_DATE
-------- -------------    ---------  --------
0312     01-JUN-07        12-DEC-02  12-DEC-02
                                     31-MAR-04
                                     30-APR-05
                                     31-MAY-06
                                     01-JUN-07
0313     15-MAY-08        01-NOV-06  12-NOV-06
                                     31-MAR-07
                                     01-NOV-06
                                     31-MAY-07
                                     15-MAY-08

i.e. I want to show per Department, the max hire date, the min hire date and then the hire dates and other employee related information. Problem is that when I use aggregate functions, I cannot have other pieces of information that I need and if I use sub-queries (sub-selects), then the performance is very bad. Is there any way to do this in SQL? I know that this should probably be done on the reporting side rather than the database side since in reporting (SSRS, Crystal or others), one can just put a function to do the max/min over a group but that would also have an overhead so if I can do this in SQL itself, that will be good. Also, I need to be able to have the same SQL for both Oracle and SQL Server since we are a vendor company and our product is used by clients who could be running on either of these platforms.

The answer to this question is – Use Analytics and you can easily do this in SQL – and it can be done for both Oracle and SQL Server (version 2005 and above though). Here is the solution – let’s create the data first (Note to the readers – whenever possible, please do provide the scripts to re-create your scenarios – it helps us save time and get back to you faster with a solution):

SET NOCOUNT ON;
GO
CREATE TABLE dbo.EMP_MASTER (DEPT_NBR NVARCHAR(10), HIRE_DATE DATETIME, FIRST_NAME NVARCHAR(30), LAST_NAME NVARCHAR(30));
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’12-DEC-02’, ‘Joe’, ‘Snyder’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’31-MAR-04’, ‘David’, ‘Gilo’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’30-APR-05’, ‘Smitha’, ‘Reilly’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’31-MAY-06’, ‘Naveen’, ‘Andrews’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’01-JUN-07’, ‘Matthew’, ‘Winkle’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’12-NOV-06’, ‘Kathy’, ‘Konnor’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’31-MAR-07’, ‘Vikesh’, ‘Gupta’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’01-NOV-06’, ‘Martha’, ‘Stewart’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’31-MAY-07’, ‘Jim’, ‘Diego’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’15-MAY-08’, ‘Arthur’, ‘Doyle’);

And here is the code to get the data in the required format:

select
row_number() over (partition by e.dept_nbr order by e.hire_date) as RN,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then e.dept_nbr else ” end as dept_nbr,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), min(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as min_hire_date,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), max(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as max_hire_date,
e.hire_date
from dbo.emp_master as e
order by e.dept_nbr, e.hire_date;

RN                   dept_nbr   min_hire_date max_hire_date hire_date
-------------------- ---------- ------------- ------------- -----------------------
1                    0312       12/12/2002    06/01/2007    2002-12-12 00:00:00.000
2                                                           2004-03-31 00:00:00.000
3                                                           2005-04-30 00:00:00.000
4                                                           2006-05-31 00:00:00.000
5                                                           2007-06-01 00:00:00.000
1                    0313       11/01/2006    05/15/2008    2006-11-01 00:00:00.000
2                                                           2006-11-12 00:00:00.000
3                                                           2007-03-31 00:00:00.000
4                                                           2007-05-31 00:00:00.000
5                                                           2008-05-15 00:00:00.000

If you look at the code above, you will see that we are using analytic functions and we partition the data by department first and then order it by the hiring date within that department – we just look for the very first record and then do a min or a max in order to retrieve our values from that group. There are a couple of other ways to achieve the same as well. Above just demonstrates the usage of the analytic functions in making this pretty simple.

Posted in Oracle, SQL Server | Leave a Comment »

Converting Physical Servers to Virtual Machines

Posted by decipherinfosys on July 24, 2008

In almost any virtualization project, you will run into the need of converting physical machines to virtual machines. Both VMWare and Microsoft provide solutions for that. Here is the link that gives the details of the vmware converter. And here is the link for the Microsoft Virtual Server Migration Toolkit and another one that details VMM (Virtual Machine Manager) over here.

We have used the vmware one quite a few times and it is pretty easy to use and very robust as well. It does not require any downtime either. Haven’t had any experience with the MSFT toolkit so will post on it once we play with it in our labs.

Posted in Virtual Server, VMWare | Leave a Comment »

Transparent Data Encryption in SQL Server 2008

Posted by decipherinfosys on July 24, 2008

Transparent Data Encryption (TDE) is another new feature in SQL Server 2008 which implements database level encryption. SQL Server 2005 already had the column level encryption and with TDE we can now prevent scenarios where the backups or the physical media (tape drive) which contains sensitive data is stolen and then someone reads it by restoring the backups or re-attaching the database files. This encryption is done at the page level. The data that is read off the disk is decrypted when reading the data and encrypted when writing it to the disk. The encryption is done using a database encryption key also called as DEK which is protected using a certificate that is stored on the master database of the instance.

One thing to note (and something that we had not realized when initially working with TDE) is that if any user database on an instance uses TDE, then the tempdb system database will also be encrypted. We have not done benchmarks yet to see the performance penalty of this but wanted to point this out so that you are aware of it. The column level encryption which was introduced in SQL Server 2005 offers more granularity but TDE can be useful as well since not only does it protect your back-ups, it allows the ability to search encrypted data.

Let’s create a new database and then a table in it that has credit card information, SSN information etc.:

create database decipher_test
go
use decipher_test
go
create table dbo.test_encryption
(
first_name nvarchar(10),
last_name nvarchar(10),
ssn nvarchar(10),
credit_card_nbr nvarchar(16)
);

set nocount on
go
insert into dbo.test_encryption values (‘Joe’, ‘Snyder’, ‘234423456’, ‘1234567890123456’);

Now, let’s take a backup of this database. After taking the back-up, if we open it up using Notepad, we can easily read this data – see the image below:

Please do note that we searched for “J o e” since this is unicode. The whole point of doing this exercise was to show that the native backups are readable. One can also easily restore this database on another instance and take all the SSN and Credit Card related information. There are third party tools out there that allow for the encryption as well for the backups but that is a discussion for another post. When this functionality is already built in SQL Server 2008, why should we invest in another third party tool.

So, how do we go about using TDE? We first need to create a database master key:

USE MASTER
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘supersecret!Nahhhh’;
GO

Next, we will create a certificate to protect the DEK:

CREATE CERTIFICATE DCTEST WITH SUBJECT = ‘DEK DCTEST CERTIFICATE’;
GO

And now, we can set the encryption for the DECIPHER_TEST database by creating the DEK:

USE DECIPHER_TEST
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE DCTEST;
GO

We used 128 bit AES as our encryption algorithm but you can choose from others as well. Those are listed in BOL and you can also access them in the online version over here.

The last step is to bring it on! (Sorry – turn it on!):

ALTER DATABASE DECIPHER_TEST SET ENCRYPTION ON;

And now, if you try to take a backup and open it up with notepad, you will see that the data is not readable. This time, let us try to restore this encrypted database. If I try to restore this on another instance on which the certificate does not exist, it will fail with error message:

“Cannot find server certificate with thumbprint…”

In order to be able to restore that backup on another instance, we will need to backup the certificate as well:

USE MASTER
GO
BACKUP CERTIFICATE DCTEST
TO FILE = ‘E:\DCTEST.cer’
WITH PRIVATE KEY (FILE = ‘E:\DCTEST_Key.pvk’ ,
ENCRYPTION BY PASSWORD = ‘supersecret!Nahhhh’ )
GO

The next step is to then copy the certificate to the other instance, create the master encryption key just like we did above (with a different password):

USE MASTER
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘gotit!Nahhhh’;
GO

And now, we can simply import the certificate that we copied over:

USE MASTER
GO
CREATE CERTIFICATE DCTEST
FROM FILE = ‘E:\DCTEST.cer’
WITH PRIVATE KEY (FILE = ‘E:\DCTEST_Key.pvk’ ,
DECRYPTION BY PASSWORD = ‘supersecret!Nahhhh’ );

Instead of “TO FILE”, we used “FROM FILE” and instead of “ENCRYPTION”, we used “DECRYPTION” in order for the import to go through. After the certificate has been created on the other instance, the restore will go through fine and we can look at the data the same was as on the source system.

There is one more feature along the same lines that needs to be mentioned here. Besides using a certificate in the master database to protect the DEK, we can also use what is called as the EKM (Extensible Key Management) feature. EKM allows us to store the keys used to encrypt the data separately as compared to the data that we are protecting. This is made possible by exposing the encryption functionality to the hardware vendors that address EKM using Hardware Security Modules (HSM). More and more acronyms, eh! We will discuss EKM and HSM in a future post.

Posted in SQL Server | 1 Comment »