Systems Engineering and RDBMS

BitMask Fields

Posted by decipherinfosys on January 4, 2008

This blog post is in response to one of the questions that we got from our reader. Here is the question:

“I’d like to hear your opinions on bitmask fields. I have found a lot of flames on using bitmask fields into RDBMS and most of them look a lot like the cons of EAV models, which I do agree (I just learned it the hard way, after using this design in a system which later became real hell), but what are the cons in bitmask fields when we just need an “open” set of boolean flags? Apart from indexing (which I believe would be impossible to be done) would it be better to add a new bit field for every new flag to your existing databases than using a bitmask field?”

This is a very good question and something that we have seen being used in many designs that we have encountered. There are two scenarios that we will point out in this regard where folks typically make use of this kind of a scheme.

Scenario # 1:

Using a bitmask field and thus de-normalizing and getting rid of an intersection table. Let’s first formulate this scenario using an example.

Suppose that we have a USER logical entity and an Access logical entity in our system and these have a M:N relationship between them i.e. a many-to-many relationship. A USER can have 1 or more than one access permissions and an access permission can belong to 1 or more than one users. The table structures given below are abbreviated just to provide an explanation for this blog post.

Let us first see how this will look like in a properly normalized scheme when we move from the logical model to the physical implementation model. When we do that, we will have three tables: USER_MASTER, ACCESS_MASTER and USER_ACCESS_XREF with the third table acting as the cross-reference table or the intermediate table. Here are the definitions and some sample data:

USE TEMPDB
GO
SET NOCOUNT ON
GO

CREATE TABLE USER_MASTER
(
USER_MASTER_ID INT IDENTITY NOT NULL,
FIRST_NAME NVARCHAR(30) NOT NULL,
MIDDLE_NAME NVARCHAR(30) NULL,
LAST_NAME NVARCHAR(30) NOT NULL,
CONSTRAINT PK_USER_MASTER PRIMARY KEY (USER_MASTER_ID)
)
GO

CREATE TABLE ACCESS_MASTER
(
ACCESS_MASTER_ID INT IDENTITY NOT NULL,
ACCESS_NAME NVARCHAR(500) NOT NULL,
CONSTRAINT PK_ACCESS_MASTER PRIMARY KEY (ACCESS_MASTER_ID)
)
GO

CREATE TABLE USER_ACCESS_XREF
(
USER_MASTER_ID INT,
ACCESS_MASTER_ID INT,
CONSTRAINT PK_USER_ACCESS_XREF PRIMARY KEY (USER_MASTER_ID, ACCESS_MASTER_ID)
)
GO

INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME) VALUES (‘JOE’, ‘SMITH’);
INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME) VALUES (‘DAN’, ‘ROGERS’);
INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME) VALUES (‘ROBERT’, ‘BAITIS’);

INSERT INTO ACCESS_MASTER (ACCESS_NAME) VALUES (‘connect’);
INSERT INTO ACCESS_MASTER (ACCESS_NAME) VALUES (‘read’);
INSERT INTO ACCESS_MASTER (ACCESS_NAME) VALUES (‘write’);
INSERT INTO ACCESS_MASTER (ACCESS_NAME) VALUES (‘alter’);
INSERT INTO ACCESS_MASTER (ACCESS_NAME) VALUES (‘view definition’);

INSERT INTO USER_ACCESS_XREF VALUES (1, 1);
INSERT INTO USER_ACCESS_XREF VALUES (2, 1);
INSERT INTO USER_ACCESS_XREF VALUES (2, 2);
INSERT INTO USER_ACCESS_XREF VALUES (3, 1);
INSERT INTO USER_ACCESS_XREF VALUES (3, 2);
INSERT INTO USER_ACCESS_XREF VALUES (3, 3);
INSERT INTO USER_ACCESS_XREF VALUES (3, 4);
INSERT INTO USER_ACCESS_XREF VALUES (3, 5);

Now, in order to see the users and their access rights, we can simply write up a SQL statement to do the joins between those 3 tables. Example:

SELECT UM.FIRST_NAME + ‘ ‘ + COALESCE(UM.MIDDLE_NAME, ”) + ‘ ‘ + UM.LAST_NAME AS USERNAME, AM.ACCESS_NAME
FROM dbo.USER_MASTER AS UM
INNER JOIN dbo.USER_ACCESS_XREF AS UAX
ON UM.USER_MASTER_ID = UAX.USER_MASTER_ID
INNER JOIN dbo.ACCESS_MASTER AS AM
ON UAX.ACCESS_MASTER_ID = AM.ACCESS_MASTER_ID
ORDER BY UM.USER_MASTER_ID
GO

USERNAME         ACCESS_NAME

---------------- ----------------

JOE  SMITH       connect

DAN  ROGERS      connect

DAN  ROGERS      read

ROBERT  BAITIS   connect

ROBERT  BAITIS   read

ROBERT  BAITIS   write

ROBERT  BAITIS   alter

ROBERT  BAITIS   view definition

When one wants to remove a permission for a given user, one can just do a delete from the cross-reference table and if one wants to add a permission, one can simply insert into this XREF table another record. If one wants to flatten the above structure out so that one gets only a single row per user, that pivoting can be done pretty easily. You can read more on pivoting on our blog site – search for pivot.

In a de-normalized scheme, we have seen people add a column to the ACCESS_MASTER table – something like ACCESS_BIT. And then, store the ACCESS_BIT_MASK column in the USER_MASTER table and get rid of the USER_ACCESS_XREF table all-together. Here is the code:

USE TEMPDB
GO
SET NOCOUNT ON
GO

CREATE TABLE ACCESS_MASTER
(
ACCESS_MASTER_ID INT IDENTITY NOT NULL,
ACCESS_NAME NVARCHAR(500) NOT NULL,
ACCESS_BIT INT NOT NULL,
CONSTRAINT PK_ACCESS_MASTER PRIMARY KEY (ACCESS_MASTER_ID)
)
GO

CREATE TABLE USER_MASTER
(
USER_MASTER_ID INT IDENTITY NOT NULL,
FIRST_NAME NVARCHAR(30) NOT NULL,
MIDDLE_NAME NVARCHAR(30) NULL,
LAST_NAME NVARCHAR(30) NOT NULL,
ACCESS_BIT_MASK INT NOT NULL,
CONSTRAINT PK_USER_MASTER PRIMARY KEY (USER_MASTER_ID)
)
GO

INSERT INTO ACCESS_MASTER (ACCESS_NAME, ACCESS_BIT) VALUES (‘connect’, 1);
INSERT INTO ACCESS_MASTER (ACCESS_NAME, ACCESS_BIT) VALUES (‘read’, 2);
INSERT INTO ACCESS_MASTER (ACCESS_NAME, ACCESS_BIT) VALUES (‘write’, 4);
INSERT INTO ACCESS_MASTER (ACCESS_NAME, ACCESS_BIT) VALUES (‘alter’, 8);
INSERT INTO ACCESS_MASTER (ACCESS_NAME, ACCESS_BIT) VALUES (‘view definition’, 16);

INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME, ACCESS_BIT_MASK) VALUES (‘JOE’, ‘SMITH’, 1);
INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME, ACCESS_BIT_MASK) VALUES (‘DAN’, ‘ROGERS’, 3);
INSERT INTO USER_MASTER (FIRST_NAME, LAST_NAME, ACCESS_BIT_MASK) VALUES (‘ROBERT’, ‘BAITIS’, 31);

So, what do those ACCESS_BIT_MASK values mean? Since Joe Smith had only connect privileges, he has a bit mask value of 1, since Dan Rogers had both connect and read access privileges, he has a bit mask value of 3 (1 + 2) and since Robert Baitis had all the privileges, he has a bit mask value of 31 ( 1 + 2 + 4 + 8 + 16).

Now, in order to find out the rights of the users one does not need to join anywhere and only needs to use the bitwise AND comparison. Example:

SELECT
UM.FIRST_NAME + ‘ ‘ + COALESCE(UM.MIDDLE_NAME, ”) + ‘ ‘ + UM.LAST_NAME AS USERNAME,
CASE WHEN UM.ACCESS_BIT_MASK & 1 <> 0 THEN ‘YES’ ELSE ‘NO’ END AS CONNECT_ACCESS,
CASE WHEN UM.ACCESS_BIT_MASK & 2 <> 0 THEN ‘YES’ ELSE ‘NO’ END AS READ_ACCESS,
CASE WHEN UM.ACCESS_BIT_MASK & 4 <> 0 THEN ‘YES’ ELSE ‘NO’ END AS WRITE_ACCESS,
CASE WHEN UM.ACCESS_BIT_MASK & 8 <> 0 THEN ‘YES’ ELSE ‘NO’ END AS ALTER_ACCESS,
CASE WHEN UM.ACCESS_BIT_MASK & 16 <> 0 THEN ‘YES’ ELSE ‘NO’ END AS VIEW_DEF_ACCESS
FROM dbo.USER_MASTER AS UM
ORDER BY UM.USER_MASTER_ID
GO

USERNAME             CONNECT_ACCESS READ_ACCESS WRITE_ACCESS ALTER_ACCESS VIEW_DEF_ACCESS

-------------------- -------------- ----------- ------------ ------------ ---------------

JOE  SMITH           YES            NO          NO           NO           NO

DAN  ROGERS          YES            YES         NO           NO           NO

ROBERT  BAITIS       YES            YES         YES          YES          YES

The argument made by proponents of this design is that those access rights will never change and their bit values will never change either. And doing this helps them getting rid of a potentially very large intersection table and also of doing joins at the times of the reads. One would just add or subtract the ACCESS_BIT value from the ACCESS_BIT_MASK as the access rights get added or removed respectively.

Well, here are the dis-advantages of this approach besides the fact that this is a de-normalized design:

a) Such a design does not allow for referential integrity. In order to enforce referential integrity in this case, one would need to find non-standard ways of enforcing the referential integrity – one such example is the usage of a trigger in order to enforce this. Another one is to make use of application code to enforce it instead of the database doing it. Both are highly un-desirable approaches both because of performance and maintenance.

b) Since there is no FK constraint, the DRI is out of the door i.e Declarative Referential Integrity i.e. ON DELETE CASCADE etc.

c) If there is a change in the value, the DDL change needs to be made.

d) Indexes cannot be used for searches: In order to see which users have say the Read Access, one will need to use the bitwise AND operator and that means that one would need to apply the condition to the WHERE clause and thus the index will not get used (unless using FBI in Oracle or computed columns in SQL Server).

e) If you have too many bit values, it will reduce the number of records that the table could hold and will require a design change. Example: The bit values go from 1, 2, 4, 8, 16, 64, …for record 25, you have a value of 16777216. Suppose that one had say 1000 records, the value will be too large and you will end up storing numbers in string data-type column and splitting (or dividing it with values) and casting/converting to integer/big integer data-types in order to do bitwise operation or would need to create more columns both of which are bad design choices.

There is a way to handle very large bitmasks (by making use of an auxiliary table that is a table of numbers) but we would not go into those details here.

Scenario #2:

When there are a lot of flag options for a given entity i.e. when there are a lot of boolean flag options for a given entity then also we have seen end users making use of such a scheme. Say, we have a table called WHSE_MASTER which stands for a Warehouse data set in a schema. Depending upon the configuration for a given implementation, one can then have several configuration options for each Warehouse. And each of these options are essentially just flag options i.e. toggle switches that indicate a Yes or a No. So, for the sake of simplicity, let us assume that we have SUPPORT_INBOUND_FLAG, SUPPORT_OUTBOUND_FLAG, SUPPORT_CROSS_DOCK_FLAG etc.

If one follows the properly designed scheme, then one would create a bit flag column for every option. The advantage of this is to have properly named columns which actually signify something to the end user rather than storing that meaning elsewhere in a look-up table.

In addition, if one needs to index these columns, it is pretty easy to do so – one can also index a single bitmask column that contains all these values however, one would then need to apply functions on it to get the filter condition – and as we know, application of functions on an indexed column negates the usage of the index (unless you are using FBI – function based indexes in Oracle or Computed Columns in the case of SQL Server).

Another problem is the same issue as point # (e) that was mentioned in Scenario #1 though having that many bit flag columns is unlikely for an entity. One would also need to maintain a look-up table with those bit values to be able to do addition and subtraction of values.

In such a scheme, we have also seen folks deciding to create a single column with a string data-type and using values like 100110 (or YNNYYN) in order to represent those flag options. The index issue, the another look-up table issue still remains and now there is another added complexity – one now needs to do substring operations in order to do the addition and removal of options (positional updates).

And another approach that we have seen people take is also to use the VARBINARY(N) data-type to store the bitmasks values like: “0x0E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000”

and then doing operations on this to ascertain the different values.  It suffers from the same sets of issues as discussed before for other approaches.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: