Systems Engineering and RDBMS

Archive for the 'Business Intelligence' Category


Performance Point Server (PPS)

Posted by decipherinfosys on May 2, 2008

While working on PPS at a client site yesterday, I got stuck and looked up two blogs that I refer for PPS work(in addition to MSDN, TechNet) - thought that I would post the links here so that you can also benefit from them:

Microsoft PPS Team Blog

Norm’s PPS Blog

Posted in Business Intelligence, SQL Server | No Comments »

Factless Fact Tables

Posted by decipherinfosys on April 7, 2008

There is a term that is used in dimensional modeling which is called Factless Fact Tables. Sounds like an oxymoron since what use is a fact table if it is not recording any facts? Factless fact tables are those that fact tables that consist of only keys i.e. they do not have any measures in them but only keys pointing to the different dimensions. These tables essentially capture the M:N (many to many) relationship between the different dimensions and do not contain any numeric or text facts in them. We have seen two practical scenarios for the usage of factless fact tables:

1) A common use of such tables is to track events. Taking the example of the medical industry (the same one that we did in our previous example of dimensional modeling), when tracking insurance related events, we had to create a factless fact table in order to track the different events which covered the M:N relationships between the different dimensions in the system.

2) The second scenario is called a coverage table.  Let’s consider a retail store for this example.  If we are tracking the sales of different products in the store, we will more than likely have these dimensions:

Date: To track the dates for the sale.

Store: To track which store and location the sale was made at.

Product: To track which product the sale was for.

Promotion: Any promotions that are put into place to enhance the sales.

And then in the middle of it, we will have our SALES_FACT table which will record the dollar amount, the units sold etc. in addition to the keys to our dimension tables.  So, for each sale that is made, all this information will be recorded and will be made available for analysis.  But what about the products that were put on a promotion but did not sell?

There are two ways of handling this: a) create records in the SALES_FACT table with 0 as the sales for those products under that time duration for a store during that promotion.  Though this is logically valid, this will bloat the table data my many folds since we will be creating entries into the fact table for all the products for that day in a given store during the promotion scheme.  A coverage factless fact table comes to the rescue under such a situation.  One can create such a fact table with only the 4 keys that point to the dimensions mentioned above.  Having such a table allows us to record each product in each store that is on a promotion in a given promotion time frame.   This complex M:N relationship is captured by making use of the factless fact table.  Now, going back to the question: “But what about the products that were put on a promotion but did not sell?“, we can get that answer by querying the coverage factless fact table to get the products that were on promotion on a given day in that store and then remove the products that were recorded in the SALES_FACT table for that store for that day since it recorded the products that did sell.

Posted in Business Intelligence | No Comments »

Dimensional Modeling

Posted by decipherinfosys on March 31, 2008

Anyone who has designed and implemented a business intelligence system knows that one of the most important thing when doing dimensional modeling is the declaration of the grain of the fact table.  What that means is that the grain should be something that represents exactly what a record within the fact table is supposed to represent.  If this is not done properly, you will not be able to roll up a lot of information and your analysis will suffer due to the bad design.  Let’s take an example to illustrate this:

We will pick an example from the Medical industry since that was one of our most recent projects.  When working on implementing a BI solution for the medical practices and small hospitals, one of the items in the billing/coding departments was the line items in the bill for the services rendered by the practice.  These were the potential dimensions for such a line item:

  1. Provider (which particular doctor or a nurse practitioner who did the work),
  2. Location (since the practice can have multiple locations),
  3. Date (of examination),
  4. Patient,
  5. Diagnosis,
  6. Procedure,
  7. Billing Organization (the practice),
  8. Responsible Party (either parent/guardian or the patients themselves),
  9. Payer (though this can be further divided into Primary Payer and Secondary Payer)

One key thing to note is that the grain for this fact table for which we have listed the potential dimensions above is a line item on the bill that is received from the medical practice.  So, the grain always needs to be very atomic - the smaller and more atomic the measures are for the fact table, the more things you can infer from such a design since it will give rise to a lot of dimensions across which you can do your analysis.  Now suppose that instead of choosing the grain to be the line item on the bill, we choose the grain to be more coarse i.e. aggregations of the atomic data.  Such an example would be to choose a grain to be the “Total monetary amount charged for a given procedure based on a certain diagnosis done per month”.  If we choose such a grain, then we will end up with very less dimensions.  In the example above, we will potentially have only these dimensions:

  1. Provider (the doctor or the nurse practitioner who did the work),
  2. Procedure,
  3. Diagnosis,
  4. Month

We have taken the extreme case to make the point of choosing the grain judiciously.  As you can see from above, the number of dimensions have reduced in number and a lot of atomic information (and thus those dimensions) are lost.  It would have made no sense to include all of those 9 dimensions that we had listed before since even if we included them, we would have ended up with very little aggregation since the fact table’s grain is not atomic enough to be able to make sense for those dimensions that represent the atomic data.

Understand that aggregations shrink the dimensions since they have less dimensional detail.  As long as one designs the fact tables with a grain that is very atomic (cannot be further divided up into smaller section), it is always possible to roll up and present the aggregated information with a lot more control over how to slice and dice the data to present other interesting facts.  In the example above, one such measured fact would be the amount that was billed for that one particular procedural diagnosis.  One can then analyze this data by rolling it up by patient, by procedural diagnosis, by provider or by combinations like amount billed per month by procedure by provider by location.  As long as one has the atomic data, none of the information gets lost.

Posted in Business Intelligence, Data Model | 1 Comment »