Systems Engineering and RDBMS

Archive for August 8th, 2008

Foreign Key Issue and resolutions

Posted by decipherinfosys on August 8, 2008

In one of the applications at the client site, the application had a design which allowed for bad data being entered into the system. The issue was fixed by fixing and putting a check in the application which writes to the database but at the database tier itself no such checks were present and as a result the data that was coming through the feeds also ended up corrupting the data in the end.

The situation was like this: There were two tables: COMPANY and COMPANY_SECURITY. Both had a surrogate key defined as an auto-incremental ID value. COMPANY table had an alternate key defined on CUSIP – in the Wall Street terms, a CUSIP is unique to every company so this was a NOT NULL UNIQUE column. The child table (COMPANY_SECURITY) had the foreign key established using the COMPANY_ID surrogate key. The alternate key of this table was SECURITY_CUSIP. A company whose stock is traded at the different exchanges like NASDAQ, NYSE etc. can have more than one security offering and the securities are unique. The COMPANY.CUSIP is 6 characters long and the COMPANY_SECURITY.SECURITY_CUSIP is 9 characters long with the first 6 digits being the same as the company cusip and the last 3 characters defining the security (2 defining it and the last being a checksum).

So, based on that understanding, say if the COMPANY.CUSIP for a company like Microsoft is 123456, then it’s securities (COMPANY_SECURITY.SECURITY_CUSIP) should be 123456xyz, 123456abc etc.. But there are no checks at the database level to make sure that the first 6 digits match between the security_cusip and the parent table’s cusip since the FK is established based on the surrogate key.

There are a couple of ways to fix this at the database tier:

1) Creation of a new object: Use a trigger to prevent the data entry into the child table in the event of bad data getting entered. This can create performance issues in the event of a highly transactional system and is a poor way of enforcing this kind of a constraint.

2) Change the table structure design: Change the Primary Key of the parent table to be on COMPANY.CUSIP i.e. this would be a scenario of making use of the natural key since CUSIP does have a business meaning – it never changes once it has been established unless there are M&A (Merger and Acquisition) activities in which case the older record is marked for deletion and a new one gets created to represent the merger/acquisition. And a lot of searches are done based on this column itself which is always of the same length. That way, the child table will also inherit the CUSIP column since we would be using it for the foreign key enforcement and then store only the three characters for the security_cusip instead of storing all 9. So, the child table will end up with CUSIP (6 characters) and SECURITY_CUSIP (3 characters) and that combination will make it unique. For the reads, there can be either of the two strategies:

a) Based on the concatenation of the values from the same table – do understand though that when operations are done on indexed columns, the index seek operations do not happen. One can use an indexed view though to do that and do the reads off of that.

b) Split the input parameter into two – one of 6 characters and the other of 3 and then by joining the parent and the child table, put the right parameter in the right where clause condition. This approach has the benefit of using the indexes since no operations are being done on the indexed columns.

3) Introduce new column: Same as #2 but keep the Primary key as such and introduce redundancy and put the CUSIP column in the child table as well and introduce the FK between that column and the parent’s CUSIP column. A FK can defined on a UNIQUE NOT NULL column as well – does not necessarily have to to be a Primary Key.

4) Add a new Check Constraint with a UDF: Keep the same schema and use a CHECK constraint to define the constraint across the two tables. Please do note that the ANSI SQL Standards do allow the user of sub-queries in check constraints but not all RDBMS follow it. What you can do though is that you can define a UDF (User Defined Function) that performs an existence check against the parent table and returns a 1 or a 0 and define the check constraint based on that. For example:


Table Definitions


create table company
company_id int identity(1,1) not null,
company_name nvarchar(20) not null,
ticker nvarchar(10) not null,
cusip nvarchar(6) not null
, constraint pk_company primary key (company_id));

create unique index company_ind_1 on (cusip);

create table company_security
company_security_id int identity(1,1) not null,
company_id int not null,
security_cusip nvarchar(9) not null,
company_cusip as substring(security_cusip, 1, 6)
, constraint pk_company_security primary key (company_security_id)
, constraint fk_company foreign key (company_id) references;

create unique index company_security_ind_1 on dbo.company_security (security_cusip);


Define the UDF

CREATE FUNCTION dbo.udf_check_cusip
(@company_id int, @security_cusip nvarchar(9))
WHERE company_id = @company_id and cusip = substring(@security_cusip, 1, 6))

Now, add the CHECK constraint using the above UDF


alter table dbo.company_security with check add constraint ck_cusip check (dbo.udf_check_cusip (company_id, security_cusip) = 1);

Let’s test it out now:

declare @i int
insert into (company_name, ticker, cusip) values (N’Microsoft’, N’MSFT’, N’123456′);
select @i = SCOPE_IDENTITY()
select @i
insert into dbo.company_security (company_id, security_cusip) values (@i, N’123456123′);

This will go through fine.

Now, let’s enter data that violates the check constraint:

insert into dbo.company_security (company_id, security_cusip) values (1, N’100006123′);

And we will get this error:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint “ck_cusip”. The conflict occurred in database “AdventureWorks”, table “dbo.company_security”.
The statement has been terminated.

5) Change the table structure: Same as option #2 but in this case we do not change the PK. We keep the surrogate keys and in the child table, only use the three digit security_cusip. The unique index will then be the combination of the company_id and the three digits will make it unique. Kind of like having the fixed length field 6 digit CUSIP + 3 digit SECURITY_CUSIP as the Primary Key (i.e. a natural key selection like option #2 above). This would be fine for writes … when you do the reads – even for the ones which only need information from the company_security table (though such scenarios will be rare), you will need to join the company and the company_security table and present the data for security cusips by concatenating the company’s cusip and the security cusip. Same read scenarios as mentioned in #2 above apply.

So, as you can see from above, there are a lot of ways of addressing this issue and depending upon the state of the project and the impact to the application, you can chose your solution. Option #2 or 5 are the right design choices.

Note: This option was also tried by the client: Computed column. You cannot use a computed column to enforce the FK since computed columns cannot be used for defining FKs i.e. you cannot just introduce a computed column based on substring(security_cusip, 1, 6) and then try to establish a FK between it and the parent table’s CUSIP column.

Posted in Data Model, DB2 LUW, Oracle, SQL Server | Leave a Comment »