Systems Engineering and RDBMS

A fun exercise - building a lottery number generator

Posted by decipherinfosys on July 18, 2008

We were planning a get together with some of our friends and other community members in our sub-division and were thinking about some games to keep everyone interested and occupied and one of the suggestions was to do a lottery at the end of it and award prizes - so, tickets were issued and we wrote our own random number generator for the lottery system. If you look into our blog posts, we have posted similar solutions in the past by using a CTE and generating a table of sequential numbers as well as random numbers and have covered the generation of random numbers quite extensively in some of the posts. This solution was very much similar to those with a combination from both those posts. Here is how the SQL portion of that code looked like (used a SQL Server Solution for this one). In this post, we use the same type of CTE solution as we had used in another post here.

with
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
Number AS (SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS X FROM N4 order by X),
Option_Val AS (SELECT TOP (5) X FROM NUMBER ORDER by NEWID())
SELECT stuff((select ‘,’ + CAST(X AS VARCHAR(10)) AS [text()]
from option_val
order by x
for xml path(”)
), 1, 1, ”) as Lottery_Numbers;

And if you run this SQL code, you will see separate set of 5 numbers which will be pulled each time. This was used using an application front end to provide a good look and feel and a “real” application interface to the end users - it was a fun activity. We made it a bit simpler - if anyone had 3 of those numbers, they got the first prize, if they had two of those numbers, that was the second prize. Had a couple of draws and then gave that application to the kids who had fun pressing the button and magically seeing different numbers popping up on the screen :-)

Posted in SQL Server | No Comments »

Benchmarks and clearing the cache

Posted by decipherinfosys on July 17, 2008

This is one topic on which I hold a slightly different opinion from my colleagues. I have seen in every shop that I have been to that they clear out the cache prior to doing another benchmark run - whether they are using Oracle or SQL Server.  In SQL Server, one could use the DBCC commands to flush out the cache - DBCC FREEPROCCACHE would do that for you.  It is granular enough to allow you to even remove a specific plan if you provide it with a plan_handle or a sql_handle.  What this command does is that it clears out the plan cache and say you have a stored procedure that you are benchmarking, it would cause it to recompile since there is nothing in the cache.  Some people while doing benchmarking do this for every iteration of the benchmark in order to get “real world” numbers.  Ask yourself, do you clear out the cache in your production system as well every day?  Plan re-use and changing of the plans as one provides different input values and as one puts on the load on the system and increases up the data volume is what (IMHO) needs to be tracked rather than trying to clear out the cache each time to make a VP happy :-)

Other forms of such DBCC commands are DBCC FREESESSIONCACHE and DBCC FREESYSTEMCACHE.   You can look up more information on the syntax and their usage in BOL.

In Oracle, prior to Oracle 10g, in Oracle 9i, one could use the “alter tablespace offline/online” command to flush the buffer cache of the blocks that pertain to that tablespace.  In Oracle 10g and 11g, there is an alter system command to flush out the cache:

alter system flush buffer_cache;

Benchmarks should reflect reality and unless your benchmark consists of a single query where you want to not account for what is in the cache, I don’t see any reason for using these commands in a benchmark.  In a real world, the buffer cache is never 100% empty.  So, where are these commands useful then?  They would be useful for a developer/DBA who have their own instance and are doing performance testing on some code.

Posted in Oracle, SQL Server | No Comments »

UDF limitations in SQL Server

Posted by decipherinfosys on July 16, 2008

UDFs (User Defined Functions) are a great feature in every RDBMS. We have covered their usefulness before in our blog posts. Based on some of the questions that we have been getting, we thought about writing a blog post that just talks about the limitations of the UDFs in SQL Server, so here goes:

1) The TRY…CATCH block cannot be used in a UDF. Infact, other error handling modes: @@ERROR or RAISERROR are not supported either.

2) Non deterministic functions like getdate() cannot be used with a UDF. Though there are workarounds for it.  And it is allowed in SQL Server 2008.

3) A UDF (any of the three variations - scalar, inline or multi-statement) cannot be used to return multiple result sets.

4) UDFs cannot call stored procedures. They can, however, call extended stored procedures.

5) UDFs cannot make use of dynamic SQL or temporary tables within the code. Table variables are allowed though.

6) Any type of SET commands are not allowed with a UDF.

Again, do remember that UDFs have a lot of use in T-SQL. The above points are just to summarize the limitations since many questions like why can’t I use TRY-CATCH in a UDF or why doesn’t usage of a temp table (though through a table variable, that is hardly a limitation) work in a UDF get addressed by the above post. All this is very well documented in BOL as well.

Posted in SQL Server | No Comments »

Outsourcing to an Off-shore company - finding the right vendor

Posted by decipherinfosys on July 16, 2008

This post is non technical in nature. We are posting it here because recently we helped one of our clients to go through a selection process of choosing an off-shore company for their development and QA activities. This was our third such engagement in which we have helped our clients get the most value from a consulting engagement. Outsourcing to an off shore company has a lot of advantages as well as things to be vary of. A lot of material is available on the internet in that regard so instead of writing a dissertation on it, we will point you to the IAOP (International Association of Outsourcing Professionals) site which has a lot of good whitepapers in this regard. You can access it here.

The process that we took the client through consisted of these steps:

  1. Interviewing the key stakeholders to learn about the needs of the company - short term as well as a long term vision.
  2. Forming the goals statement and matching those goals against a list of outsourcing companies from off-shore locations. A mix of Tier-1, 2 and 3 companies were chosen as well as non-Tiered companies were chosen as well to give the client an opportunity to consider more than one outsourcing off-shore vendor (this has benefits and challenges that we will cover later on in this post).
  3. Having discussions with the client to discuss the different companies and narrowing it down to a list of top 5.
  4. Contacted those 5 companies and established the communication between the client and the representatives from these companies. All of them also had a presence in the US.
  5. Prepared a preliminary agenda for the first conference call between the client and the 5 companies. This was to go over an Agenda that was inline with the client’s needs and to make sure that both sides understood what the goals were.
  6. The next step was to prepare a RFI (Request for Information) document while working with the client (different departments) and send it to the 5 companies.
  7. Collected the RFI responses and had discussions with the clients to narrow it down to the top 3. A rating system was developed in three categories: Technical, Operational and Commercial & Legal. All three had sub-sections against which each reviewer (including us and members from the client team) rated the vendors on a scale of 1-5. Weight measures (how important each category was for the success of the company) was also put in on a scale of 0-5 (0 because some sub-sections in a particular category might not make sense for the goals of the employee - there was only 1 such sub-section in this case which related to hosting. The client wanted to host the systems here (an ASP model is followed by this client).
  8. Next step was to prepare an RFP (Request for Proposal) and to send it to the top 3 vendors. We co-ordinated the questions and helped answer the questions on both sides related to the process and having worked with the client’s IT department, we were aware of the business needs as well.
  9. Upon receipt and review of the completed RFP from the vendors, we sent follow-up questions that were addressed during the face-to-face meetings with each. An agenda with proper targeted questions was prepared for these meetings.
  10. The ratings were done again for the 3 vendors against the RFP responses.
  11. Weight measures were averaged out and where there were significant differences, they were discussed among the team members and normalized.
  12. References were checked to ensure a recommendation was made to go with two outsourcing vendor companies.

This whole process was completed in 2 months time frame and was a very pleasant experience for both the client as well as the outsourcing vendor companies. As far as going with multiple outsourcing vendors is considered, it is a complex process and has risks associated with it but if managed properly, it does provide a lot of benefit as well to the company. Here are the benefits and disadvantages of such an approach:

Benefits:

  1. The big bang outsourcing with a single vendor can lead to issues for the company since if the decision turns out to be not working for the company, down the line it is difficult to recover from it.
  2. By choosing to work with 2 outsourcing off shore vendors, the company can cut costs and also foster a healthy competition between the vendors while taking advantage of vendor specialization and technical expertise. In this client’s case, one of the vendors (a Tier-1 company) was pretty expensive in terms of rates (nearly 3 times the cost prices of the second vendor (a Tier-3 company)) but they did have more business knowledge of the domain.
  3. The second vendor did not have related business domain experience but did have a rich pool of technical resources and was ranked very well as compared to the Tier-1 vendor in the evaluation process.
  4. It mitigates the risk for our client since down the line, they could switch from one vendor to the other if things do not work out with a vendor. Since the vendors would have been already engaged on a T&M basis, that transition for the team would be seamless.
  5. It also gives them an opportunity to engage the Tier-1 vendor with probably a fixed cost project for one of the projects that requires a good amount of domain expertise in order to be successful in a short span of time.

NOTE: As per the Gartner group, multi-sourcing will remain the dominant sourcing model. Organizations like GM, Kodak, P&G have done this quite successfully by managing the relationships better. As per a survey from CIO magazine (from Feb. 2007), here was the break up:

  • 3 or more outsourcing vendors: 42%
  • 2 outsourcing vendors: 36%
  • 1 outsourcing vendor: 22%

Disadvantages/Challenges:

  1. Managing 2 outsourcing off shore vendors is a time consuming and a complex process. It also adds to some cost upfront since the client now needs to have 2 separate resources for managing these 2 relationships. If appropriate governance is in place, then this risk is mitigated.
  2. If not properly managed and explicitly stated in the contracts, this can lead to a blame game. We can mitigate this by explicitly defining the contract terms.
  3. This typically requires some extra legal documentation work.

If you have a need for finding an outsourcing partner company at an off-shore location, please feel free to contact us at: info@decipherinfosys.com

Posted in Outsourcing | No Comments »

Even Oracle launches a BI application for iPhone

Posted by decipherinfosys on July 15, 2008

Oracle released an iPhone application that enables the end users to access the analytics using OBIEE (Oracle Business Intelligence Enterprise Edition). You can read more on this here.

Posted in Oracle | No Comments »

Hyper-V is available now

Posted by decipherinfosys on July 15, 2008

As you know, a beta version of Hyper-V was included with the Windows 2008 release. It’s RTM (Released to Manufacturing) version was been released and it is ready for deployments. You can read more over here as well as download it from that link. And if you are new to Hyper-V, this MSDN post will be a good point to start from.

Posted in Windows | No Comments »

More on Index design

Posted by decipherinfosys on July 14, 2008

We have blogged a couple of times over the factors that should be considered when designing indexes. In one of such posts, we had also covered indexes over multiple columns. In all of those posts, you will see one common thing that we try to emphasize - do not try to come up with a “rules of thumb“.  Here is a simple example (from one of the e-mails to a question to a reader);

Question: I am trying to see how to tune/optimize a query and I saw that an index on the where condition columns is missing.  Should I just go ahead and create an index and add all the columns that are used in the where clause into that index?

The answer is of course: NO.  First, we requested the reader to go ahead and read the post from above where we have discussed the column order in covered indexes and then also gave a simple example to help illustrate the point:

Say, you have this query:

select col1, col2, col3

from dbo.big_table

where col1 >= @x and col1 <= @y

and col3 = @z

For this, we might want to create a covered index (in this order): col3, col2, col1.  This is so that we can first of all process the data based on the equality operator on col3, then scan through and narrow down the data based on the >= and <= conditions on col1 and then get the value for col2 from the index itself rather than going through the table.  Now, as we have always stated, understanding your data and it’s characteristics is very very important.  Suppose that col3 is unique in this table!  In that case, we would just create a single index on col3 and be done with it.  Why?  It will qualify for a single record and thus we will just do the row ID lookup on the table for that one single record.

So, bottom line is that please spend some time to understand the data characteristics - understand how indexes work, understand how the query is accessing the data and the data distribution - density and selectivity of the column data values as well as how often that particular index is going to be used - if an index is being created to support a report that runs only once every 6 months, is it worth to have it in the schema all year round or should we just create it prior to generating that report?  Once you have that understanding, then only decide how you want to approach the design.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Sparse Columns in SQL Server 2008

Posted by decipherinfosys on July 13, 2008

Sparse columns are a new feature in SQL Server 2008. These columns are normal columns in a table that have an optimized storage format for NULL values. So, they basically reduce the space requirements when they contain NULL values (they do not take up any space when they have NULL values) but there is an overhead associated with the retrieval of those columns for not null values. BOL states that one should consider them when the savings in terms of space are at least 20-40%. First let’s see an example and then we can go into the advantages and the limitations of sparse columns in SQL Server 2008. Both Create Table and Alter Table commands support the creation/alteration of sparse columns/column sets (we will discuss column sets shortly). In the example below, we are creating a de-normalized table with the employee information and their address in the same table:

CREATE TABLE dbo.SPARSE_TEST
(
ID            INT        IDENTITY(1,1)    NOT NULL,
FIRST_NAME    NVARCHAR(50)            NOT NULL,
LAST_NAME        NVARCHAR(50)            NOT NULL,
ADDR_LINE_1    NVARCHAR(20)            SPARSE NULL,
ADDR_LINE_2    NVARCHAR(20)            SPARSE NULL,
CITY            NVARCHAR(20)            SPARSE NULL,
STATE_NAME    NVARCHAR(2)            SPARSE NULL,
COUNTRY        NVARCHAR(10)            SPARSE NULL,
ZIP_CODE        NVARCHAR(20)            SPARSE NULL,
ADDRESS_SET   XML COLUMN_SET FOR ALL_SPARSE_COLUMNS,
CONSTRAINT PK_SPARSE_TEST PRIMARY KEY (ID)
)
GO
CREATE UNIQUE INDEX SPARSE_TEST_IND_1 ON DBO.SPARSE_TEST (FIRST_NAME, LAST_NAME)
GO

For examples sake, we have put all address fields as sparse fields since an address or part of those attributes might not be known. And then you can see in the end that we have an ADDRESS_SET which is of data type XML and is defined as a column set for all the sparse columns in the table - per table, there can be only one column set. A column set is essentially an untyped XML representation that combines all the sparse columns in a table into a more structured output. You can read more on when to use column sets and when not to use them in the BOL online version here. There are quite a few major benefits of this - like using it for a content management system like MS Sharepoint or even for product catalogs where every column does not apply to each and every product. Having the column set in an XML format also means that the application can even select/update/insert this data set. One thing to caution for is that if you are using this column for updating the values, be specific in giving values to all the sparse columns else you will end up putting the rest of the sparse columns to be NULL.

Let’s populate it with some data now:

set nocount on
go
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Jack', 'Black', '123 hill street', null, 'Seattle', 'WA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Steve', 'Smith', '2300 one point place', 'more address info.', 'Los Angeles', 'CA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Dennis', 'Baitis', '15 Main Street', null, 'Atlanta', 'GA', 'USA', '30339');

And now, let’s look at the data:

If we just do a “Select * from dbo.sparse_test”, we will not even get the sparse columns in the output - infact, we will get only the column_set column. Output is too big to post here - so, here is the output for the first record’s column set:

<ADDR_LINE_1>123 hill street</ADDR_LINE_1><CITY>Seattle</CITY><STATE_NAME>WA</STATE_NAME><COUNTRY>USA</COUNTRY>

As you can see, you only see those columns in here that had values provided for them. Sparse columns also work very well with filtered indexes. So, to sum up, the advantages are that:

a) when the sparse columns contain NULL values, they do not take up any space at all. We will do some benchmarks in the coming days to see how much savings that translates into.

b) column behavior remains the same as other non-sparse columns so application wise, there is no change.

c) column set is present which behaves like a normal column and can be very useful in scenarios where not all the attributes apply to all the items in the tables like a product catalog (though through a normalized design, it is less of a problem) - these are also very useful though for any content management system like MS Sharepoint since it will help in reducing the storage needs in places where all the attributes do not apply to all the items.  Same is true for any contact management system where a lot of information gets applied to a single individual.  Since a lot of that information applies to only a sub-set of the users, that wastes up a lot of space in current systems.

d) CDC (Change Data Capture) and transactional replication work with sparse columns (column sets does not work).

The limitations are also quite a few:

a) Such columns cannot be a part of the clustered index.

b) Data Compression and Merge replication do not work with these columns.

c) Even though BOL states that there is an overhead to retrieve the values from such columns (the not null values), we haven’t done any benchmark to give you any numbers in that regard…but would assume that
is true and hope that hit is pretty minimal. Believe the hit is because for the not null values, it takes 4 more bytes than the normal column. So, for an int, it will take 8 bytes when there is a value in the column. That is why one has to evaluate when “exactly” is this feature useful - read the BOL link in the start that we had given which mentions that use it only when the savings in terms of space are atleast 20-40%.

d) There is a restriction on the data types i.e. not all columns with every data type can be declared as SPARSE. Text, ntext, image, filestream, other spatial data types cannot be used.

So, bottom line is that before jumping on making use of this new feature, read and understand it’s usage scenarios as well as it’s limitations and then only use it in your applications. This is not for every application but in some types of applications, it can be really very advantageous. This will hopefully also help avoid the design pattern that is called EAV (Entity Attribute Value).

Posted in SQL Server | No Comments »

SSRS 2008 vs SSRS 2005

Posted by decipherinfosys on July 12, 2008

SQL CAT (Customer Advisory Team) from Microsoft released a new technical note on the comparison between Reporting Services 2008 vs 2005 version from a scalability perspective. The new memory management architecture of SSRS 2008 enabled it to scale very easily as compared to it’s predecessor. You can read the technical note from the SQL CAT team over here.

Posted in SQL Server | No Comments »

Virtualization Wars

Posted by decipherinfosys on July 11, 2008

We have covered virtualization before at our blog but mostly VMWare and a couple of posts on Virtual Server as well. With Windows Server 2008’s Hyper-V, it looks like it will challenge the market even more for ESX Server 3.5. With hardware changes coming in way faster than the software changes, the servers age out faster due to hardware issues rather than the OS. With virtualization, a physical server can easily be migrated into a virtual environment with the same applications and then moved on to a new physical hardware. We have clients who are using virtualization now even in their production environments let alone the development and QA environments.

If you are not already using virtualization in your environments, now would be the time to start getting serious about it. With Hyper-V, the technology moves the virtualization layer directly against the hardware. Hyper-V allows virtualization of both 32 bit as well as 64 bit architectures. MSFT has extended their virtualization platform offering by providing Microsoft App. virtualization (this was known as SoftGrid before) and desktop virtualization. not only that, one can use the System Center Virtual Machine Manager to do centralized management of a virtual set up.

Not only are these choices cost effective by providing us with consolidation and DR/HA (Disaster Recovery and High Availability) choices, it also eases up maintenance work for System Engineers as well as DBAs. An added benefit of consolidation is reduced electricity costs :-)

Both ESX Server 3.5 and Hyper-V are based on hypervisor based architectures which are better than the previous hosted virtualization technologies. In the case of the hosted virtualization products, one had to run the virtualization software on top of the OS of the machine and that adds a lot of overhead and not only that, it has a longer code execution path for the VMs. Hypervisor based architectures run the hypervisor directly on the hardware which means that there is no OS between the hypervisor and the system hardware. Even though both ESX Server 3.5 and Hyper-V as both based on hypervisor base architectures, there are a good bit of differences between them. We are currently doing some benchmarks with ESX Server 3.5 as well as Hyper-V and will post the results of the tests at our blog as and when they become available and we consolidate them. Till then, you can read more on these over here:

ESX Server 3.5

Hyper-V and the FAQs

Posted in VMWare, Virtual Server, Windows | 1 Comment »