Systems Engineering and RDBMS

Null vs Empty (Zero Length) string

Posted by decipherinfosys on March 27, 2009

Not many end users understand the difference between an empty string of zero length (”) vs Null. We have seen that at times, the end users put in an empty string in a field and there are no checks in the system to prevent them from being able to get that data into the database.  An empty string of zero length is not the same as a Null value in SQL Server. A string of zero length is still a string with no characters – a Null on the other hand can mean more than just denoting an absence of data.  Maybe the data is not applicable or the data is missing or it is just not present yet.  A classic analogy given in this case is that of a blank CD with nothing on it vs no CD.  A lot of debate is there about the usefulness of Null and the design patterns but that is not what we want to cover here today.  We will provide some links in the Resources section which cover that topic as well.  What we want to mention in this post today are the differences between a Null value vs a zero length string and how the behavior is also different between different RDBMS.  We will cover this for a couple of different data types besides the string data type.

We have blogged about the side effects of using a zero length string for an integer  data type column before – here.  Today, we saw a similar issue at a client site – this time with a datetime data type column.  Here is an example (using SQL Server here):

declare @table table (col1 datetime)

insert into @table values ('');

select * from @table

———————–
1900-01-01 00:00:00.000

The empty string gets converted to the default datetime value of 1900-01-01 00:00:00.000 as shown above.  Not really what the end user was expecting.  And why do we get that particular date?  Because datetime is internally stored as two integers (hence the 8 bytes per datetime value) – the first integer stores the number for computing dates before or after the base date of 1900-01-01 and the second integer has the number of clock ticks post midnight with each tick being 1/300th of a second.

So, what could be some other issues that this can result into besides silently corrupting the data?

a) Your SQL code that would otherwise expect to work like the IS NULL or IS NOT NULL checks won’t work anymore,
b) In case you were doing this for a number column, the aggregate functions like AVG() which otherwise would be ignoring the Null value will now count the record,
c) The SQL code which uses functions like ISNULL(), COALESCE() etc. will not function as expected since these fields will not have a Null value.
d) The sorts won’t work as expected since Null and the empty string (and the subsequent default value that actually gets inserted) are not the same thing.
e) If you have this column as part of the foreign key, you will get an error at the time of the insert itself since instead of a Null value, the code will try to insert another default value in and it will violate the FK constraint.
f) Any concatenation operations or MAX(), MIN() functions can lead to un-desired results.

As always, there is no substitute for good design and good code.  A good design can put checks in place both at the application level (validations – either at the client side or application layer) as well as the DB level (check constraint) to prevent such scenarios from happening.

Do note that if you enter in an empty string of more than zero length, in SQL Server, the result would be the same.  ANSI_PADDING does effect the storage but the comparison rules remain the same.  You can read more on that in one of our previous blog posts here.

And if we were to use a varchar or a char column, the behavior is the same with the difference of course being that it is string of length 1 in the case of a CHAR data-type column.

declare @table table (col1 char(1), col2 varchar(1))

insert into @table values ('', '');

select col1, datalength(col1) char_str_length, col2, datalength(col2) varchar_str_length from @table where col1 = ''

col1 char_str_length col2 varchar_str_length
---- --------------- ---- ------------------
     1                    0

Also, if you are a developer who has worked in both SQL Server and Oracle, then you would know that the behavior in Oracle is a bit different.  It treats Null and an empty string of zero length, the same way.  And it treats an empty string of more than a zero length different than a Null value.  This is different than the ANSI (and SQL Server behavior).  So, in Oracle:

a) A zero length variable length string (varchar2 data type) is treated as a NULL.
b) A string of more than zero length is not treated as a NULL.
c) A zero length fixed length string (char data type) is not treated as a NULL since CHAR data types are blank padded strings.

Here is an example:

SQL> select 1 as col1,length('') LEN  from dual where '' is null

col1        LEN
———- ———-
1 NULL

SQL> select 1 as col1,length('') LEN  from dual where to_char('') is null

col1        LEN
———- ———-
1 NULL

This shows that an empty string is treated as a null in Oracle and the default data type of an empty string is varchar2 since if it were char, then automatically its length would have been one as stated above.

It is always better to check for such issues and in order to follow the same guidelines across RDBMS, check for empty strings via client side logic or application layer logic or check constraints at the DB level and ensure that you are going to put in NULL if that is what your design intention was rather than having such side effects of the empty string.

Resources:

  • The design of software – good discussion on null vs empty string – here.
  • SQL Batman’s post on SQL Server and Null values – here and then the post by Aaron Alton – here. Very good articles.
  • Empty strings and default values – here. We covered above the integer, datetime and string fields, this post from Madhivanan covers other data types as well.
  • Oracle vs SQL Server discussion on this topic @ Stackoverflow – here and another discussion’s insight into why Oracle’s behavior is not like ANSI standards – here.

2 Responses to “Null vs Empty (Zero Length) string”

  1. […] Null vs Empty (Zero Length) string […]

  2. […] Posted by decipherinfosys on June 9, 2009 While working on an integration project, client’s developer had asked about the “T” separator between date and time value in the datetime data type fields and also about the default date of 1900-01-01 which we have covered in our blog posts before – here.  Also, note that an empty string is different than NULL value – the default value for an empty datetime string is converted to 1900-01-01 00:00:00.000 but for a NULL value, unless you are using XSINIL or coalescing that field to be an empty string, that element will not be present in the output. You can read more on the difference between an empty string and NULL over here. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: