Systems Engineering and RDBMS

Archive for the 'SQL Server' Category


A fun exercise - building a lottery number generator

Posted by decipherinfosys on July 18, 2008

We were planning a get together with some of our friends and other community members in our sub-division and were thinking about some games to keep everyone interested and occupied and one of the suggestions was to do a lottery at the end of it and award prizes - so, tickets were issued and we wrote our own random number generator for the lottery system. If you look into our blog posts, we have posted similar solutions in the past by using a CTE and generating a table of sequential numbers as well as random numbers and have covered the generation of random numbers quite extensively in some of the posts. This solution was very much similar to those with a combination from both those posts. Here is how the SQL portion of that code looked like (used a SQL Server Solution for this one). In this post, we use the same type of CTE solution as we had used in another post here.

with
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
Number AS (SELECT TOP (1000) ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS X FROM N4 order by X),
Option_Val AS (SELECT TOP (5) X FROM NUMBER ORDER by NEWID())
SELECT stuff((select ‘,’ + CAST(X AS VARCHAR(10)) AS [text()]
from option_val
order by x
for xml path(”)
), 1, 1, ”) as Lottery_Numbers;

And if you run this SQL code, you will see separate set of 5 numbers which will be pulled each time. This was used using an application front end to provide a good look and feel and a “real” application interface to the end users - it was a fun activity. We made it a bit simpler - if anyone had 3 of those numbers, they got the first prize, if they had two of those numbers, that was the second prize. Had a couple of draws and then gave that application to the kids who had fun pressing the button and magically seeing different numbers popping up on the screen :-)

Posted in SQL Server | No Comments »

Benchmarks and clearing the cache

Posted by decipherinfosys on July 17, 2008

This is one topic on which I hold a slightly different opinion from my colleagues. I have seen in every shop that I have been to that they clear out the cache prior to doing another benchmark run - whether they are using Oracle or SQL Server.  In SQL Server, one could use the DBCC commands to flush out the cache - DBCC FREEPROCCACHE would do that for you.  It is granular enough to allow you to even remove a specific plan if you provide it with a plan_handle or a sql_handle.  What this command does is that it clears out the plan cache and say you have a stored procedure that you are benchmarking, it would cause it to recompile since there is nothing in the cache.  Some people while doing benchmarking do this for every iteration of the benchmark in order to get “real world” numbers.  Ask yourself, do you clear out the cache in your production system as well every day?  Plan re-use and changing of the plans as one provides different input values and as one puts on the load on the system and increases up the data volume is what (IMHO) needs to be tracked rather than trying to clear out the cache each time to make a VP happy :-)

Other forms of such DBCC commands are DBCC FREESESSIONCACHE and DBCC FREESYSTEMCACHE.   You can look up more information on the syntax and their usage in BOL.

In Oracle, prior to Oracle 10g, in Oracle 9i, one could use the “alter tablespace offline/online” command to flush the buffer cache of the blocks that pertain to that tablespace.  In Oracle 10g and 11g, there is an alter system command to flush out the cache:

alter system flush buffer_cache;

Benchmarks should reflect reality and unless your benchmark consists of a single query where you want to not account for what is in the cache, I don’t see any reason for using these commands in a benchmark.  In a real world, the buffer cache is never 100% empty.  So, where are these commands useful then?  They would be useful for a developer/DBA who have their own instance and are doing performance testing on some code.

Posted in Oracle, SQL Server | No Comments »

UDF limitations in SQL Server

Posted by decipherinfosys on July 16, 2008

UDFs (User Defined Functions) are a great feature in every RDBMS. We have covered their usefulness before in our blog posts. Based on some of the questions that we have been getting, we thought about writing a blog post that just talks about the limitations of the UDFs in SQL Server, so here goes:

1) The TRY…CATCH block cannot be used in a UDF. Infact, other error handling modes: @@ERROR or RAISERROR are not supported either.

2) Non deterministic functions like getdate() cannot be used with a UDF. Though there are workarounds for it.  And it is allowed in SQL Server 2008.

3) A UDF (any of the three variations - scalar, inline or multi-statement) cannot be used to return multiple result sets.

4) UDFs cannot call stored procedures. They can, however, call extended stored procedures.

5) UDFs cannot make use of dynamic SQL or temporary tables within the code. Table variables are allowed though.

6) Any type of SET commands are not allowed with a UDF.

Again, do remember that UDFs have a lot of use in T-SQL. The above points are just to summarize the limitations since many questions like why can’t I use TRY-CATCH in a UDF or why doesn’t usage of a temp table (though through a table variable, that is hardly a limitation) work in a UDF get addressed by the above post. All this is very well documented in BOL as well.

Posted in SQL Server | No Comments »

More on Index design

Posted by decipherinfosys on July 14, 2008

We have blogged a couple of times over the factors that should be considered when designing indexes. In one of such posts, we had also covered indexes over multiple columns. In all of those posts, you will see one common thing that we try to emphasize - do not try to come up with a “rules of thumb“.  Here is a simple example (from one of the e-mails to a question to a reader);

Question: I am trying to see how to tune/optimize a query and I saw that an index on the where condition columns is missing.  Should I just go ahead and create an index and add all the columns that are used in the where clause into that index?

The answer is of course: NO.  First, we requested the reader to go ahead and read the post from above where we have discussed the column order in covered indexes and then also gave a simple example to help illustrate the point:

Say, you have this query:

select col1, col2, col3

from dbo.big_table

where col1 >= @x and col1 <= @y

and col3 = @z

For this, we might want to create a covered index (in this order): col3, col2, col1.  This is so that we can first of all process the data based on the equality operator on col3, then scan through and narrow down the data based on the >= and <= conditions on col1 and then get the value for col2 from the index itself rather than going through the table.  Now, as we have always stated, understanding your data and it’s characteristics is very very important.  Suppose that col3 is unique in this table!  In that case, we would just create a single index on col3 and be done with it.  Why?  It will qualify for a single record and thus we will just do the row ID lookup on the table for that one single record.

So, bottom line is that please spend some time to understand the data characteristics - understand how indexes work, understand how the query is accessing the data and the data distribution - density and selectivity of the column data values as well as how often that particular index is going to be used - if an index is being created to support a report that runs only once every 6 months, is it worth to have it in the schema all year round or should we just create it prior to generating that report?  Once you have that understanding, then only decide how you want to approach the design.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Sparse Columns in SQL Server 2008

Posted by decipherinfosys on July 13, 2008

Sparse columns are a new feature in SQL Server 2008. These columns are normal columns in a table that have an optimized storage format for NULL values. So, they basically reduce the space requirements when they contain NULL values (they do not take up any space when they have NULL values) but there is an overhead associated with the retrieval of those columns for not null values. BOL states that one should consider them when the savings in terms of space are at least 20-40%. First let’s see an example and then we can go into the advantages and the limitations of sparse columns in SQL Server 2008. Both Create Table and Alter Table commands support the creation/alteration of sparse columns/column sets (we will discuss column sets shortly). In the example below, we are creating a de-normalized table with the employee information and their address in the same table:

CREATE TABLE dbo.SPARSE_TEST
(
ID            INT        IDENTITY(1,1)    NOT NULL,
FIRST_NAME    NVARCHAR(50)            NOT NULL,
LAST_NAME        NVARCHAR(50)            NOT NULL,
ADDR_LINE_1    NVARCHAR(20)            SPARSE NULL,
ADDR_LINE_2    NVARCHAR(20)            SPARSE NULL,
CITY            NVARCHAR(20)            SPARSE NULL,
STATE_NAME    NVARCHAR(2)            SPARSE NULL,
COUNTRY        NVARCHAR(10)            SPARSE NULL,
ZIP_CODE        NVARCHAR(20)            SPARSE NULL,
ADDRESS_SET   XML COLUMN_SET FOR ALL_SPARSE_COLUMNS,
CONSTRAINT PK_SPARSE_TEST PRIMARY KEY (ID)
)
GO
CREATE UNIQUE INDEX SPARSE_TEST_IND_1 ON DBO.SPARSE_TEST (FIRST_NAME, LAST_NAME)
GO

For examples sake, we have put all address fields as sparse fields since an address or part of those attributes might not be known. And then you can see in the end that we have an ADDRESS_SET which is of data type XML and is defined as a column set for all the sparse columns in the table - per table, there can be only one column set. A column set is essentially an untyped XML representation that combines all the sparse columns in a table into a more structured output. You can read more on when to use column sets and when not to use them in the BOL online version here. There are quite a few major benefits of this - like using it for a content management system like MS Sharepoint or even for product catalogs where every column does not apply to each and every product. Having the column set in an XML format also means that the application can even select/update/insert this data set. One thing to caution for is that if you are using this column for updating the values, be specific in giving values to all the sparse columns else you will end up putting the rest of the sparse columns to be NULL.

Let’s populate it with some data now:

set nocount on
go
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Jack', 'Black', '123 hill street', null, 'Seattle', 'WA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Steve', 'Smith', '2300 one point place', 'more address info.', 'Los Angeles', 'CA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Dennis', 'Baitis', '15 Main Street', null, 'Atlanta', 'GA', 'USA', '30339');

And now, let’s look at the data:

If we just do a “Select * from dbo.sparse_test”, we will not even get the sparse columns in the output - infact, we will get only the column_set column. Output is too big to post here - so, here is the output for the first record’s column set:

<ADDR_LINE_1>123 hill street</ADDR_LINE_1><CITY>Seattle</CITY><STATE_NAME>WA</STATE_NAME><COUNTRY>USA</COUNTRY>

As you can see, you only see those columns in here that had values provided for them. Sparse columns also work very well with filtered indexes. So, to sum up, the advantages are that:

a) when the sparse columns contain NULL values, they do not take up any space at all. We will do some benchmarks in the coming days to see how much savings that translates into.

b) column behavior remains the same as other non-sparse columns so application wise, there is no change.

c) column set is present which behaves like a normal column and can be very useful in scenarios where not all the attributes apply to all the items in the tables like a product catalog (though through a normalized design, it is less of a problem) - these are also very useful though for any content management system like MS Sharepoint since it will help in reducing the storage needs in places where all the attributes do not apply to all the items.  Same is true for any contact management system where a lot of information gets applied to a single individual.  Since a lot of that information applies to only a sub-set of the users, that wastes up a lot of space in current systems.

d) CDC (Change Data Capture) and transactional replication work with sparse columns (column sets does not work).

The limitations are also quite a few:

a) Such columns cannot be a part of the clustered index.

b) Data Compression and Merge replication do not work with these columns.

c) Even though BOL states that there is an overhead to retrieve the values from such columns (the not null values), we haven’t done any benchmark to give you any numbers in that regard…but would assume that
is true and hope that hit is pretty minimal. Believe the hit is because for the not null values, it takes 4 more bytes than the normal column. So, for an int, it will take 8 bytes when there is a value in the column. That is why one has to evaluate when “exactly” is this feature useful - read the BOL link in the start that we had given which mentions that use it only when the savings in terms of space are atleast 20-40%.

d) There is a restriction on the data types i.e. not all columns with every data type can be declared as SPARSE. Text, ntext, image, filestream, other spatial data types cannot be used.

So, bottom line is that before jumping on making use of this new feature, read and understand it’s usage scenarios as well as it’s limitations and then only use it in your applications. This is not for every application but in some types of applications, it can be really very advantageous. This will hopefully also help avoid the design pattern that is called EAV (Entity Attribute Value).

Posted in SQL Server | No Comments »

SSRS 2008 vs SSRS 2005

Posted by decipherinfosys on July 12, 2008

SQL CAT (Customer Advisory Team) from Microsoft released a new technical note on the comparison between Reporting Services 2008 vs 2005 version from a scalability perspective. The new memory management architecture of SSRS 2008 enabled it to scale very easily as compared to it’s predecessor. You can read the technical note from the SQL CAT team over here.

Posted in SQL Server | No Comments »

SP_HELPDB and the Insert Error

Posted by decipherinfosys on July 10, 2008

A reader asked yesterday how to resovle this error that he was getting when trying to run sp_helpdb on one of the instances:

Server: Msg 515, Level 16, State 2, Procedure sp_helpdb, Line 53
Cannot insert the value NULL into column ‘owner’, table ‘tempdb.dbo.#spdbdesc_
The statement has been terminated.

As you might already know, sp_helpdb returns information about all of your databases on a given instance and can be very useful for different reasons - one can be to monitor the size of the databases - we had blogged that before here. For the error above, that the end user received - the issue resolution is pretty straight forward. As the error message states, the database needs to have an owner else it will try to insert a NULL value into the “Owner” column of the temporary table and it will fail to provide the output when the system stored procedure sp_helpdb gets executed.

And the resolution is pretty straight forward as well - find the databases which do not have an owner assigned and assign an owner to them. We can use sysdatabases or sys.databases (in SQL Server 2005 and 2008):

For SQL Server 2000:

select name as database_name, SUSER_SNAME(sid) as owner_name, crdate as create_date_time
from master.dbo.sysdatabases

For SQL Server 2005:

select name as database_name, SUSER_SNAME(owner_sid) as owner_name, create_date as create_date_time
from master.sys.databases

And then see which databases do not have an owner assigned to them. The ones that do not have an owner assigned to them can be assigned an owner using this command:

Use <put the database name here>

go

exec sp_changedbowner ‘put the owner name here’

Once that is done, execution of sp_helpdb will work perfectly fine.

Posted in SQL Server | No Comments »

SQL Server 2008 due in August?

Posted by decipherinfosys on July 10, 2008

As per this eWeek article, the next version is pretty close to the RTM. This information was released at the Microsoft Worldwide Partner Conference so am pretty sure that it is accurate. The nightly scrapper script also pointed out a post was made at the Data Platform Technologies blog as well in this regard. We have been working with SQL Server 2008 since it’s CTP1 stages and are very impressed with the new feature sets. You can search for SQL Server 2008 on this blog and get the articles and the whitepapers on it.

Posted in SQL Server | No Comments »

Computed Column based on XML Data Type

Posted by decipherinfosys on July 10, 2008

We have discussed computed columns quite a bit in our posts at this blog. You can search for “computed column” and get to all those posts - the basic ones are here and here. A few days ago, a reader had asked whether we can create a computed column based on an XML Data Type column and if so, how can we go ahead and do it. In this post, we will present such a solution along with a deterministic function that we will use to extract out the value for making the computed column.

We will use the same example for the XML that we had used in our post here to demonstrate the nodes() method. The function is pretty simple and is based on the same XML as was demonstrated in the post above with some changes:

create function dbo.fn_xml_computed
(
@x xml,
@va_lkup varchar(10)
)
returns bigint
as
begin
declare @shid bigint
set @shid = (select
T.VA.value(’Key[1]‘, ‘bigint’) as Share_Holder_ID
from @x.nodes(’//Details’) as T(VA)
where T.VA.exist(’VoteAction/text()[.=sql:variable("@va_lkup")]‘) = 1)
return @shid
end
go

Here, we will be taking in two parameters - one would be the actual vote XML and the second one will be the vote action performed and we will use the function to extract out the information on the Shareholder_ID value. So, our table definition would look something like this:

create table dbo.testcase
(
vote_xml xml,
vote_action varchar(10),
shareholder_id AS dbo.fn_xml_computed(vote_xml, vote_action)
)

And now let’s do the insert into the table now:

insert into dbo.testcase (vote_xml, vote_action) values
(
'<VoteAgent>
                        <InstructAck>
                                          <MtgNtfcnId>444</MtgNtfcnId>
                                          <Details>
                                                <Key>0987654321</Key>
                                                <Shares>100</Shares>
                                                <VoteAction>FOR</VoteAction>
                                          </Details>
                        </InstructAck>
         </VoteAgent>',
'FOR'
);

Let’s check the results:

select * from dbo.testcase

vote_xml    vote_action shareholder_id
----------- ----------- --------------------
<....>      FOR         987654321

As you can see from above, the shareholder_id value got extracted and populated in the column automatically. So, the computed column concept applies to the XML data type columns as well.

Posted in SQL Server | No Comments »

SQL Server on Windows 2008

Posted by decipherinfosys on July 9, 2008

Here is an excellent post from the folks at Microsoft on what needs to be done in order to install SQL Server (2005 or 2008 version) on Windows 2008…an excellent read:

http://blogs.msdn.com/sqlsecurity/archive/2008/07/01/sql-server-and-the-windows-server-2008-firewall.aspx

Posted in SQL Server, Windows | No Comments »