Systems Engineering and RDBMS

Archive for July, 2008

SOX – Tracking properties for the SQL Server Login Accounts

Posted by decipherinfosys on July 21, 2008

Here is a simple SQL to track the information on the login accounts (Windows as well as SQL Server accounts):

select
name as login_name,
type_desc as login_type,
LOGINPROPERTY(name, ‘PasswordLastSetTime’) as Pswd_last_Set,
LOGINPROPERTY(name, ‘IsLocked’) as Locked_Account,
LOGINPROPERTY(name, ‘IsExpired’) as Expired_Login,
LOGINPROPERTY(name, ‘IsMustChange’) as Must_Change_On_Login,
LOGINPROPERTY(name, ‘HistoryLength’) as Account_Tracked_For,
LOGINPROPERTY(name, ‘LockoutTime’) as Date_Account_Locked_Out
from sys.server_principals
where type in (‘S’, ‘U’)
order by type

It uses the function LOGINPROPERTY() and lists out the different login names, their type and then selective properties that you are interested in tracking.  For SOX compliance as well as SAS70 requirements, it is good to have this SQL run every now and then in order to collect the information.  Also, there could be alerts set up based on the data set that is returned in order to alert the admin about expired/locked accounts etc.

Posted in SQL Server | Leave a Comment »

Looking at all the SQL Server Instances in the network

Posted by decipherinfosys on July 21, 2008

In SQL Server, one way to look for the instances that one can connect is to simply run:

osql -L or sqlcmd -L

which will list out all the instances on the network that you can look at. You can also look at this through the GUI as shown below:

However, when doing this I noticed that one of the servers that I was looking for was not available in this list. I knew the IP address of the box and it was a default instance so I was able to connect through fine using the credentials but it set me thinking why would that particular instance not be available in the list. Then, I terminal serve’d into that box and noticed using the server configuration manager that we can chose not to expose the name of the instance if we do not want to. In the image below:

You can see that, the the “Hide Instance” property was set to Yes. Once that was changed to “No”, I was able to see the instance in the browse list as well as using osql -L. This, however tells us that if we want, this is yet another way to secure a SQL Server instance on the network.

Posted in SQL Server | Leave a Comment »

Benchmarks and clearing the cache

Posted by decipherinfosys on July 17, 2008

This is one topic on which I hold a slightly different opinion from my colleagues. I have seen in every shop that I have been to that they clear out the cache prior to doing another benchmark run – whether they are using Oracle or SQL Server.  In SQL Server, one could use the DBCC commands to flush out the cache – DBCC FREEPROCCACHE would do that for you.  It is granular enough to allow you to even remove a specific plan if you provide it with a plan_handle or a sql_handle.  What this command does is that it clears out the plan cache and say you have a stored procedure that you are benchmarking, it would cause it to recompile since there is nothing in the cache.  Some people while doing benchmarking do this for every iteration of the benchmark in order to get “real world” numbers.  Ask yourself, do you clear out the cache in your production system as well every day?  Plan re-use and changing of the plans as one provides different input values and as one puts on the load on the system and increases up the data volume is what (IMHO) needs to be tracked rather than trying to clear out the cache each time to make a VP happy :-)

Other forms of such DBCC commands are DBCC FREESESSIONCACHE and DBCC FREESYSTEMCACHE.   You can look up more information on the syntax and their usage in BOL.

In Oracle, prior to Oracle 10g, in Oracle 9i, one could use the “alter tablespace offline/online” command to flush the buffer cache of the blocks that pertain to that tablespace.  In Oracle 10g and 11g, there is an alter system command to flush out the cache:

alter system flush buffer_cache;

Benchmarks should reflect reality and unless your benchmark consists of a single query where you want to not account for what is in the cache, I don’t see any reason for using these commands in a benchmark.  In a real world, the buffer cache is never 100% empty.  So, where are these commands useful then?  They would be useful for a developer/DBA who have their own instance and are doing performance testing on some code.

Posted in Oracle, SQL Server | 1 Comment »

UDF limitations in SQL Server

Posted by decipherinfosys on July 16, 2008

UDFs (User Defined Functions) are a great feature in every RDBMS. We have covered their usefulness before in our blog posts. Based on some of the questions that we have been getting, we thought about writing a blog post that just talks about the limitations of the UDFs in SQL Server, so here goes:

1) The TRY…CATCH block cannot be used in a UDF. Infact, other error handling modes: @@ERROR or RAISERROR are not supported either.

2) Non deterministic functions like getdate() cannot be used with a UDF. Though there are workarounds for it.  And it is allowed in SQL Server 2008.

3) A UDF (any of the three variations – scalar, inline or multi-statement) cannot be used to return multiple result sets.

4) UDFs cannot call stored procedures. They can, however, call extended stored procedures.

5) UDFs cannot make use of dynamic SQL or temporary tables within the code. Table variables are allowed though.

6) Any type of SET commands are not allowed with a UDF.

Again, do remember that UDFs have a lot of use in T-SQL. The above points are just to summarize the limitations since many questions like why can’t I use TRY-CATCH in a UDF or why doesn’t usage of a temp table (though through a table variable, that is hardly a limitation) work in a UDF get addressed by the above post. All this is very well documented in BOL as well.

Posted in SQL Server | 1 Comment »

Outsourcing to an Off-shore company – finding the right vendor

Posted by decipherinfosys on July 16, 2008

This post is non technical in nature. We are posting it here because recently we helped one of our clients to go through a selection process of choosing an off-shore company for their development and QA activities. This was our third such engagement in which we have helped our clients get the most value from a consulting engagement. Outsourcing to an off shore company has a lot of advantages as well as things to be vary of. A lot of material is available on the internet in that regard so instead of writing a dissertation on it, we will point you to the IAOP (International Association of Outsourcing Professionals) site which has a lot of good whitepapers in this regard. You can access it here.

The process that we took the client through consisted of these steps:

  1. Interviewing the key stakeholders to learn about the needs of the company – short term as well as a long term vision.
  2. Forming the goals statement and matching those goals against a list of outsourcing companies from off-shore locations. A mix of Tier-1, 2 and 3 companies were chosen as well as non-Tiered companies were chosen as well to give the client an opportunity to consider more than one outsourcing off-shore vendor (this has benefits and challenges that we will cover later on in this post).
  3. Having discussions with the client to discuss the different companies and narrowing it down to a list of top 5.
  4. Contacted those 5 companies and established the communication between the client and the representatives from these companies. All of them also had a presence in the US.
  5. Prepared a preliminary agenda for the first conference call between the client and the 5 companies. This was to go over an Agenda that was inline with the client’s needs and to make sure that both sides understood what the goals were.
  6. The next step was to prepare a RFI (Request for Information) document while working with the client (different departments) and send it to the 5 companies.
  7. Collected the RFI responses and had discussions with the clients to narrow it down to the top 3. A rating system was developed in three categories: Technical, Operational and Commercial & Legal. All three had sub-sections against which each reviewer (including us and members from the client team) rated the vendors on a scale of 1-5. Weight measures (how important each category was for the success of the company) was also put in on a scale of 0-5 (0 because some sub-sections in a particular category might not make sense for the goals of the employee – there was only 1 such sub-section in this case which related to hosting. The client wanted to host the systems here (an ASP model is followed by this client).
  8. Next step was to prepare an RFP (Request for Proposal) and to send it to the top 3 vendors. We co-ordinated the questions and helped answer the questions on both sides related to the process and having worked with the client’s IT department, we were aware of the business needs as well.
  9. Upon receipt and review of the completed RFP from the vendors, we sent follow-up questions that were addressed during the face-to-face meetings with each. An agenda with proper targeted questions was prepared for these meetings.
  10. The ratings were done again for the 3 vendors against the RFP responses.
  11. Weight measures were averaged out and where there were significant differences, they were discussed among the team members and normalized.
  12. References were checked to ensure a recommendation was made to go with two outsourcing vendor companies.

This whole process was completed in 2 months time frame and was a very pleasant experience for both the client as well as the outsourcing vendor companies. As far as going with multiple outsourcing vendors is considered, it is a complex process and has risks associated with it but if managed properly, it does provide a lot of benefit as well to the company. Here are the benefits and disadvantages of such an approach:

Benefits:

  1. The big bang outsourcing with a single vendor can lead to issues for the company since if the decision turns out to be not working for the company, down the line it is difficult to recover from it.
  2. By choosing to work with 2 outsourcing off shore vendors, the company can cut costs and also foster a healthy competition between the vendors while taking advantage of vendor specialization and technical expertise. In this client’s case, one of the vendors (a Tier-1 company) was pretty expensive in terms of rates (nearly 3 times the cost prices of the second vendor (a Tier-3 company)) but they did have more business knowledge of the domain.
  3. The second vendor did not have related business domain experience but did have a rich pool of technical resources and was ranked very well as compared to the Tier-1 vendor in the evaluation process.
  4. It mitigates the risk for our client since down the line, they could switch from one vendor to the other if things do not work out with a vendor. Since the vendors would have been already engaged on a T&M basis, that transition for the team would be seamless.
  5. It also gives them an opportunity to engage the Tier-1 vendor with probably a fixed cost project for one of the projects that requires a good amount of domain expertise in order to be successful in a short span of time.

NOTE: As per the Gartner group, multi-sourcing will remain the dominant sourcing model. Organizations like GM, Kodak, P&G have done this quite successfully by managing the relationships better. As per a survey from CIO magazine (from Feb. 2007), here was the break up:

  • 3 or more outsourcing vendors: 42%
  • 2 outsourcing vendors: 36%
  • 1 outsourcing vendor: 22%

Disadvantages/Challenges:

  1. Managing 2 outsourcing off shore vendors is a time consuming and a complex process. It also adds to some cost upfront since the client now needs to have 2 separate resources for managing these 2 relationships. If appropriate governance is in place, then this risk is mitigated.
  2. If not properly managed and explicitly stated in the contracts, this can lead to a blame game. We can mitigate this by explicitly defining the contract terms.
  3. This typically requires some extra legal documentation work.

If you have a need for finding an outsourcing partner company at an off-shore location, please feel free to contact us at: info@decipherinfosys.com

Posted in Outsourcing | 4 Comments »

Even Oracle launches a BI application for iPhone

Posted by decipherinfosys on July 15, 2008

Oracle released an iPhone application that enables the end users to access the analytics using OBIEE (Oracle Business Intelligence Enterprise Edition). You can read more on this here.

Posted in Oracle | Leave a Comment »

Hyper-V is available now

Posted by decipherinfosys on July 15, 2008

As you know, a beta version of Hyper-V was included with the Windows 2008 release. It’s RTM (Released to Manufacturing) version was been released and it is ready for deployments. You can read more over here as well as download it from that link. And if you are new to Hyper-V, this MSDN post will be a good point to start from.

Posted in Windows | Leave a Comment »

More on Index design

Posted by decipherinfosys on July 14, 2008

We have blogged a couple of times over the factors that should be considered when designing indexes. In one of such posts, we had also covered indexes over multiple columns. In all of those posts, you will see one common thing that we try to emphasize – do not try to come up with a “rules of thumb“.  Here is a simple example (from one of the e-mails to a question to a reader);

Question: I am trying to see how to tune/optimize a query and I saw that an index on the where condition columns is missing.  Should I just go ahead and create an index and add all the columns that are used in the where clause into that index?

The answer is of course: NO.  First, we requested the reader to go ahead and read the post from above where we have discussed the column order in covered indexes and then also gave a simple example to help illustrate the point:

Say, you have this query:

select col1, col2, col3

from dbo.big_table

where col1 >= @x and col1 <= @y

and col3 = @z

For this, we might want to create a covered index (in this order): col3, col2, col1.  This is so that we can first of all process the data based on the equality operator on col3, then scan through and narrow down the data based on the >= and <= conditions on col1 and then get the value for col2 from the index itself rather than going through the table.  Now, as we have always stated, understanding your data and it’s characteristics is very very important.  Suppose that col3 is unique in this table!  In that case, we would just create a single index on col3 and be done with it.  Why?  It will qualify for a single record and thus we will just do the row ID lookup on the table for that one single record.

So, bottom line is that please spend some time to understand the data characteristics – understand how indexes work, understand how the query is accessing the data and the data distribution – density and selectivity of the column data values as well as how often that particular index is going to be used – if an index is being created to support a report that runs only once every 6 months, is it worth to have it in the schema all year round or should we just create it prior to generating that report?  Once you have that understanding, then only decide how you want to approach the design.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Sparse Columns in SQL Server 2008

Posted by decipherinfosys on July 13, 2008

Sparse columns are a new feature in SQL Server 2008. These columns are normal columns in a table that have an optimized storage format for NULL values. So, they basically reduce the space requirements when they contain NULL values (they do not take up any space when they have NULL values) but there is an overhead associated with the retrieval of those columns for not null values. BOL states that one should consider them when the savings in terms of space are at least 20-40%. First let’s see an example and then we can go into the advantages and the limitations of sparse columns in SQL Server 2008. Both Create Table and Alter Table commands support the creation/alteration of sparse columns/column sets (we will discuss column sets shortly). In the example below, we are creating a de-normalized table with the employee information and their address in the same table:

CREATE TABLE dbo.SPARSE_TEST
(
ID            INT        IDENTITY(1,1)    NOT NULL,
FIRST_NAME    NVARCHAR(50)            NOT NULL,
LAST_NAME        NVARCHAR(50)            NOT NULL,
ADDR_LINE_1    NVARCHAR(20)            SPARSE NULL,
ADDR_LINE_2    NVARCHAR(20)            SPARSE NULL,
CITY            NVARCHAR(20)            SPARSE NULL,
STATE_NAME    NVARCHAR(2)            SPARSE NULL,
COUNTRY        NVARCHAR(10)            SPARSE NULL,
ZIP_CODE        NVARCHAR(20)            SPARSE NULL,
ADDRESS_SET   XML COLUMN_SET FOR ALL_SPARSE_COLUMNS,
CONSTRAINT PK_SPARSE_TEST PRIMARY KEY (ID)
)
GO
CREATE UNIQUE INDEX SPARSE_TEST_IND_1 ON DBO.SPARSE_TEST (FIRST_NAME, LAST_NAME)
GO

For examples sake, we have put all address fields as sparse fields since an address or part of those attributes might not be known. And then you can see in the end that we have an ADDRESS_SET which is of data type XML and is defined as a column set for all the sparse columns in the table – per table, there can be only one column set. A column set is essentially an untyped XML representation that combines all the sparse columns in a table into a more structured output. You can read more on when to use column sets and when not to use them in the BOL online version here. There are quite a few major benefits of this – like using it for a content management system like MS Sharepoint or even for product catalogs where every column does not apply to each and every product. Having the column set in an XML format also means that the application can even select/update/insert this data set. One thing to caution for is that if you are using this column for updating the values, be specific in giving values to all the sparse columns else you will end up putting the rest of the sparse columns to be NULL.

Let’s populate it with some data now:

set nocount on
go
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Jack', 'Black', '123 hill street', null, 'Seattle', 'WA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Steve', 'Smith', '2300 one point place', 'more address info.', 'Los Angeles', 'CA', 'USA', NULL);
insert into dbo.sparse_test (first_name, last_name, addr_line_1, addr_line_2, city, state_name, country, zip_code)
values ('Dennis', 'Baitis', '15 Main Street', null, 'Atlanta', 'GA', 'USA', '30339');

And now, let’s look at the data:

If we just do a “Select * from dbo.sparse_test”, we will not even get the sparse columns in the output – infact, we will get only the column_set column. Output is too big to post here – so, here is the output for the first record’s column set:

<ADDR_LINE_1>123 hill street</ADDR_LINE_1><CITY>Seattle</CITY><STATE_NAME>WA</STATE_NAME><COUNTRY>USA</COUNTRY>

As you can see, you only see those columns in here that had values provided for them. Sparse columns also work very well with filtered indexes. So, to sum up, the advantages are that:

a) when the sparse columns contain NULL values, they do not take up any space at all. We will do some benchmarks in the coming days to see how much savings that translates into.

b) column behavior remains the same as other non-sparse columns so application wise, there is no change.

c) column set is present which behaves like a normal column and can be very useful in scenarios where not all the attributes apply to all the items in the tables like a product catalog (though through a normalized design, it is less of a problem) – these are also very useful though for any content management system like MS Sharepoint since it will help in reducing the storage needs in places where all the attributes do not apply to all the items.  Same is true for any contact management system where a lot of information gets applied to a single individual.  Since a lot of that information applies to only a sub-set of the users, that wastes up a lot of space in current systems.

d) CDC (Change Data Capture) and transactional replication work with sparse columns (column sets does not work).

The limitations are also quite a few:

a) Such columns cannot be a part of the clustered index.

b) Data Compression and Merge replication do not work with these columns.

c) Even though BOL states that there is an overhead to retrieve the values from such columns (the not null values), we haven’t done any benchmark to give you any numbers in that regard…but would assume that
is true and hope that hit is pretty minimal. Believe the hit is because for the not null values, it takes 4 more bytes than the normal column. So, for an int, it will take 8 bytes when there is a value in the column. That is why one has to evaluate when “exactly” is this feature useful – read the BOL link in the start that we had given which mentions that use it only when the savings in terms of space are atleast 20-40%.

d) There is a restriction on the data types i.e. not all columns with every data type can be declared as SPARSE. Text, ntext, image, filestream, other spatial data types cannot be used.

So, bottom line is that before jumping on making use of this new feature, read and understand it’s usage scenarios as well as it’s limitations and then only use it in your applications. This is not for every application but in some types of applications, it can be really very advantageous. This will hopefully also help avoid the design pattern that is called EAV (Entity Attribute Value).

Posted in SQL Server | Leave a Comment »

SSRS 2008 vs SSRS 2005

Posted by decipherinfosys on July 12, 2008

SQL CAT (Customer Advisory Team) from Microsoft released a new technical note on the comparison between Reporting Services 2008 vs 2005 version from a scalability perspective. The new memory management architecture of SSRS 2008 enabled it to scale very easily as compared to it’s predecessor. You can read the technical note from the SQL CAT team over here.

Posted in SQL Server | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 80 other followers