Posted by decipherinfosys on March 31, 2008
Anyone who has designed and implemented a business intelligence system knows that one of the most important thing when doing dimensional modeling is the declaration of the grain of the fact table. What that means is that the grain should be something that represents exactly what a record within the fact table is supposed to represent. If this is not done properly, you will not be able to roll up a lot of information and your analysis will suffer due to the bad design. Let’s take an example to illustrate this:
We will pick an example from the Medical industry since that was one of our most recent projects. When working on implementing a BI solution for the medical practices and small hospitals, one of the items in the billing/coding departments was the line items in the bill for the services rendered by the practice. These were the potential dimensions for such a line item:
- Provider (which particular doctor or a nurse practitioner who did the work),
- Location (since the practice can have multiple locations),
- Date (of examination),
- Billing Organization (the practice),
- Responsible Party (either parent/guardian or the patients themselves),
- Payer (though this can be further divided into Primary Payer and Secondary Payer)
One key thing to note is that the grain for this fact table for which we have listed the potential dimensions above is a line item on the bill that is received from the medical practice. So, the grain always needs to be very atomic – the smaller and more atomic the measures are for the fact table, the more things you can infer from such a design since it will give rise to a lot of dimensions across which you can do your analysis. Now suppose that instead of choosing the grain to be the line item on the bill, we choose the grain to be more coarse i.e. aggregations of the atomic data. Such an example would be to choose a grain to be the “Total monetary amount charged for a given procedure based on a certain diagnosis done per month”. If we choose such a grain, then we will end up with very less dimensions. In the example above, we will potentially have only these dimensions:
- Provider (the doctor or the nurse practitioner who did the work),
We have taken the extreme case to make the point of choosing the grain judiciously. As you can see from above, the number of dimensions have reduced in number and a lot of atomic information (and thus those dimensions) are lost. It would have made no sense to include all of those 9 dimensions that we had listed before since even if we included them, we would have ended up with very little aggregation since the fact table’s grain is not atomic enough to be able to make sense for those dimensions that represent the atomic data.
Understand that aggregations shrink the dimensions since they have less dimensional detail. As long as one designs the fact tables with a grain that is very atomic (cannot be further divided up into smaller section), it is always possible to roll up and present the aggregated information with a lot more control over how to slice and dice the data to present other interesting facts. In the example above, one such measured fact would be the amount that was billed for that one particular procedural diagnosis. One can then analyze this data by rolling it up by patient, by procedural diagnosis, by provider or by combinations like amount billed per month by procedure by provider by location. As long as one has the atomic data, none of the information gets lost.
One Response to “Dimensional Modeling”
Sorry, the comment form is closed at this time.