Systems Engineering and RDBMS

sql_variant in SQL Server and ANYDATA Type in Oracle

Posted by decipherinfosys on March 6, 2009

There are different kinds of data types available in different RDBMS and we have covered many of them in our blog posts before.  One that rarely gets used (and rightfully so) is sql_variant in SQL Server and it’s equivalent ANYDATA type in Oracle.

In this post, we will cover those data types and see under which scenarios can one make use of such data types.  To be honest,  we don’t think that it should ever be used since SQL is a strongly typed language.  However, they are there for a reason and have been incorporated by the Vendors in their RDBMS products because they saw a need to have them.  And in real live applications, we have seen 2 scenarios where folks have made use of such data types.   Am not saying that those were the right design decisions – just stating that there are live applications out there which have made use of such data types.

SQL Server:

If you have used the variant data type in VB, then sql_variant is a data type that is equivalent to that on the SQL Server side.  It has been available since SQL Server 2000.  This data type can be used for storing values of any data types other than text, ntext, image and timestamp (and also other sql_variant data type values).  The rules of comparison, max length as well as restrictions on support of sql_variant in ODBC are covered in the BOL article here.  Anytime that you make use of the sql_variant data type, it records two items:

a) the actual data value, and
b) the meta-data that describes the data value  (the data’s data type and it’s properties like precision, scale, max size etc.).

There is also an inbuilt function that is available (SQL_VARIANT_PROPERTY()) that can be used to get the metadata information for any sql_variant data type column/variable.  Let’s see this with an example:

declare @test table (col1 sql_variant primary key)
insert into @test values (1); /*an int*/
insert into @test values (‘test’); /*a string*/
insert into @test values (getdate()); /*a datetime*/
insert into @test values (newid()); /*a uniqueidentifier*/
insert into @test values (12345.123); /*a decimal*/

select
sql_variant_property(col1, ‘basetype’) as Base_Data_Type
, sql_variant_property(col1, ‘precision’) as Precision_of_Data_Type
, sql_variant_property(col1, ‘scale’) as Scale_of_Data_Type
, sql_variant_property(col1, ‘maxlength’) as MaxLength_of_Data_Type
, col1 from @test

Base_Data_Type       Precision_of_Data_Type Scale_of_Data_Type   MaxLength_of_Data_Type col1
-------------------- ---------------------- -------------------- ---------------------- -------------------------------------
int                  10                     0                    4                      1
varchar              0                      0                    4                      test
datetime             23                     3                    8                      2009-03-05 08:10:03.843
uniqueidentifier     0                      0                    16                     CF929804-8F80-4C1E-9292-DDD307E5747B
numeric              8                      3                    5                      12345.123

Do note above that the max length is in bytes.  So, for an int, it shows up as 4, for datetime as 8 etc.

A couple of things that immediately come to mind are:

a) So, since we are storing any data type information in this one single column – how will some of the functions like SUM(), MAX(), CONVERT() work?
b) Another thing to think about is the possibility of running into implicit conversions in case this data type is used for a column on which the application can filter upon (the where clause).  So, that would be bad for performance since even if we have an index on that column, the implicit conversion’s usage of a function on the indexed column will mean that the index won’t get used in the execution plan generation.

Let’s look at both the points above using an example.  And then we will mention the one place where we have seen this data type to be used effectively.

First thought for point # (a) in terms of making use of the SUM() function was that if we can filter off the data-types using the sql_variant_property() function and include only the data types that are numerics or belong to the integer data type family, we should be able to get the SUM().  But when we do that:

select sum(col1) from @test where sql_variant_property(col1, ‘basetype’) in (‘int’, ‘smallint’, ‘tinyint’, ‘decimal’, ‘numeric’)

We get an error:

Msg 8117, Level 16, State 1, Line 16
Operand data type sql_variant is invalid for sum operator.

But casting that column to be a numeric data type works fine since the limitation just seems to be on the application of the aggregate function on the sql_variant data type:

select sum(cast(col1 as numeric(18,8))) from @test where sql_variant_property(col1, ‘basetype’) in (‘int’, ‘smallint’, ‘tinyint’, ‘decimal’, ‘numeric’)

—————————————
12346.12300000

Now, what about MAX() or MIN() and other functions?  In that case, the rules of data type precedence will apply. We can check that by running a query:

select max(col1) from @test

And the output that we get is the datetime value: 2009-03-05 11:12:13.727

Now, let’s look at point #(b) above.  We will create an actual table (not a table variable) and create an index on the column with the sql_variant type to see whether the filter on it leads to an implicit conversion on the indexed column or does it do the conversion on the variable:

We created a table with the exact same structure as shown above for the table variable and had a clustered index on the col1.  Now, let’s query it:

declare @i int
set @i = 1
select * from mytest where col1 = @i

The execution plan shows a clustered index seek operation since the optimizer applied the conversion on the variable and not the indexed column:

|–Clustered Index Seek(OBJECT:([DecipherTest].[dbo].[mytest].[PK_mytest]), SEEK:([DecipherTest].[dbo].[mytest].[col1]=CONVERT_IMPLICIT(sql_variant,[@i],0)) ORDERED FORWARD)

The same will be true if the variable was of any other data type.  It does the implicit conversion on the variable and then will comparisons with the indexed column.  So, performance wise, it is fine.  Haven’t done tests on joining a sql_variant data type column with a column from another table that is of a different type – that will always lead to an implicit conversion on one of the joined columns – since joining based on separate data types columns is not a good design practice anyways, we would skip it for this post.

Now, having discussed sql_variant data type – what would be a possible place where it can be used?  Obviously, the only time that one would ever make use of this data type is when one does not know what is going to be the data type of the value(s) that we are going to be storing in a particular column.  Now, do you see anything wrong with that statement? 🙂 Or rather do you see anything correct in it?   That in itself seems to be a design violation – how can you not know the data type for the column before doing the logical/physical design?

The one place that we have seen it being used in a real live application is for Interfaces.  Many a times, when an interface table is designed by a vendor, they need to give the flexibility of having generic columns that can be used to store data values from source systems for columns/attributes that are related to customization changes for the interfaces.   Now, a design philosophy that we have also seen is that folks create x number of string columns and x number of numeric columns to provide for that flexibility.  Under such a scenario, creating x number of columns with sql_variant will ensure that one can provide the flexibility as well as not bloat up the space requirements.  Am not saying that is the best way to do it – just saying that is a real life usage of this data type for canned vendor applications for serving the interface needs.  Another point to note is that under such scenarios, using a sql_variant data type over using a string data type (to store any data-type values) ensures that the data type for the different values remains the same – so if we put in a datetime value, it remains datetime and we will be able to perform the date functions on it rather than casting it to a datetime first and then performing those operations on it.

The only other place where we have seen this being used is what is called as EAV (Entity Attribute Value) which is again a design anomaly in the relational world.  You can read more on EAVs in our previous post here.  This is also mentioned by Microsoft MVP Jonathan in his post which we have referenced below in the resources section.

Oracle:

Now, having discussed about sql_variant data type, the next question that comes in the mind of DB Developers or DBAs who have worked on both Oracle and SQL Server is whether Oracle also supports something similar?  It does.  Oracle 9i introduced a new data type called: anydata type🙂

Let’s create a table and see this in action:

SQL> Create table mytest (col1 sys.anyData);

SQL> Insert into mytest values (sys.anyData.convertDate(sysdate));

SQL> Insert into mytest values (sys.anyData.convertVarchar2(‘test’));

And just like the sql_variant_property() function in SQL Server, we can use the getTypeName method of the ANYDATA function in order to see what data type values are stored in that column:

SQL> select mytest.col1.gettypeName() as DT_Value from mytest;

DT_Value
————–
SYS.DATE
SYS.VARCHAR2

In order to get the actual value out of that column, we would need to write our own function which will need to check for the appropriate data-type by making use of getDate() or getVarchar2() etc. to extract those values out.  So, the application needs to have intelligence to glean the right data type and then get the data value.

A word of caution: Use these data types only when you absolutely need to.  Other than the scenarios mentioned above, we have not seen nor had a need to use them.  It is always a good design practice to use the right data types.  Generic solutions always sound “cool” but almost each and every time they will come back to haunt you.

Resources:

SQL Server:

  • BOL Articles: here.
  • Microsoft MVP Jonathan’s SQLBlog post – here.
  • Madhivanan’s post on the behavior changes between SQL 2000 and SQL 2005 – here.
  • Micorosft MVP Bill’s post – here.

Oracle:

  • Oracle Documentation on ANYDATA Type – here.
  • OraFAQ post by Kevin Meade – here. This is an excellent post.
  • Oracle ACE Steve Karam’s post – here.

One Response to “sql_variant in SQL Server and ANYDATA Type in Oracle”

  1. […] Posted by decipherinfosys on May 14, 2009 We have discussed before about the differences between the functions: LEN() and DataLength() in SQL Server – you can access that post over here.  Got a question from a reader yesterday about calculating the length of the data in columns defined with data types of ntext, nvarchar(max) and sql_variant.  We have covered sql_variant before and you can read more on that here. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: