Systems Engineering and RDBMS

Archive for the ‘Business Intelligence’ Category

Back to the Basics: Restoring an Analysis Services Database

Posted by decipherinfosys on February 24, 2011

Not every DBA or Database Developer has had experience with Analysis Services so it did not come as a surprise when a DBA at a client site approached us for quick help in restoring the Analysis Services Database on-to the development environment.  In our back to the basics series of posts, we will be covering this topic today.

Let’s use SSMS to connect to Analysis Services instance first.  In case you have never done that before, when you connect, select “Analysis Services” from the drop-down in the Server Type, the proper Server Name (your instance) and the authentication and you will be connected to the Analysis Services instance in which we will restore the back-up of the Analysis Services Database:

Once connected, right click on “Databases” and select “Restore”:

You will be presented with a very simple “Restore Database” window – most of the items are self explanatory – you would browse to select the location of your backup file (we selected the production backup file in the example below):

Once that is done, you need to then give a name for the restored database (in our example, we are calling it PRODCOPY), select a storage location using the browse button, Allow for the database overwrite if it already exists on the instance, choose to overwrite the security information and if the backup was encrypted, provide the password so that you can move ahead with the restore process:

Once that is done, you would be able to restore the database on that instance and then do configuration changes, data source changes etc.

Another way of doing the restore: In case you are not a GUI kind of guy and like to us scripts, you can also use XMLA.  You can read up on the command here.  You can invoke XMLA by right clicking on the Analysis Services instance and choosing “New Query” and XMLA.  The command parameters are the same as you saw in the GUI option above so add the values appropriately and then execute it in order to complete the restore process.

Resources:

  • XMLA – here.
  • Backup and Restore Analysis Services Databases – article on SQLServerPerformance.com by Ashish Kumar Mehta – here.
  • MSDN post on backing up and restoring Analysis Services Database – here.
  • Database Journal Article on restoring Analysis Services Database using Windows Powershell and SQL Server 2008 AMO – here.

Posted in Business Intelligence, SQL Server | Leave a Comment »

Data Scientist

Posted by decipherinfosys on December 22, 2010

Data Mining has fast moved on from being just a buzzword to something that most of the organizations are now quickly adopting to make informed decisions.  A client of ours recently inquired whether we have someone on staff who is a “Data Scientist”.  If you are not familiar with the term, here are some links to get you started:

  • Recent post on GigaOm – here.
  • Flowing Data post from 2009 – here.

And there are many other good posts that you can read by googling/binging the word “Data Scientist”.  In short, a data scientist is a term used for a person who is well versed with the technique of gleaning meaningful information from a bunch of data from different sources.  The expertise involves several key areas:

  1. Procuring the data or in other words: Data Acquisition (could be through Feeds, Web Crawlers, internal data sources, Social Media etc.),
  2. Scrubbing & managing the data using proper ETL, queries, key-value pairs etc.,
  3. Model & interpret the data using analytics which could be different kind of techniques in multivariate statistics, NLP, machine learning etc.
  4. Visualization/Presentation of the data, and
  5. Translating it to meaningful information for answering the business questions.

Essentially, it is mining of the data to glean meaningful information and presenting it in a way to the end user who can then interact with the data to look at the effect of different parameters and get a predictive analysis.  KDD (Knowledge Discovery in Databases) is another acronym used for it.

A data scientist may not be an expert in all the areas listed above but will possess depth in certain areas and be familiar enough with the other areas so that he/she can quickly perform basic tasks in those areas.  Typically, it will always be a team of data scientists working together on such a project.

So, what kind of business problems can be solved by the approaches listed above?  Forecasting alone is just one of them.  Others areas are: Risk management (example – used by insurance companies or the banking industry), churn analysis (example – for customer retention, many telecommunications company use these algorithms to help them retain their customer base), Marketing to the right segment (example – used by companies like Amazon by customizing their presentation to cater to your needs), Detecting anomalies (example – money laundering or credit card frauds).

Needless to state, there are a lot of tools and techniques out there that can help you build such solutions.  In our posts, we will be using SQL Server 2008 R2’s Data Mining features to demonstrate how the MSFT tools can be used to build such a solution.

Posted in Business Intelligence, SQL Server | Leave a Comment »

Cognos or MSFT BI

Posted by decipherinfosys on February 25, 2010

Got a call from a friend yesterday evening.  His company is looking to move away from Cognos and to the MSFT BI solution – primarily due to cost (licensing as well as consultants costs).  Having never worked with Cognos, we cannot really do justice to a comparison between the two but did some research on the net to see what other folks have done when faced with a similar situation.  We are posting some of the links here which might be of interest to you in case you are ever faced with the same scenario:

BI Monkey had a very good post on this topic:  http://www.bimonkey.com/2010/02/migrating-from-cognos-to-microsoft-bi/

Other articles from the net:

http://capstonebiblog.blogspot.com/2009/12/microsoft-case-studies-performancepoint.html

http://www.sqlmag.com/Article/ArticleID/99286/sql_server_99286.html

http://dashboards.tv/enterprise-reporting/170/microsoft-reporting-services-2008-demo

Dundas is now fully integrated in SSRS 2008 (MSFT acquired their technology for integration in SQL Server 2008) – they are the premier company in data visualizations and dashboarding solutions:

http://www.dundas.com/Components/Products/gauge/rs/Why/index.aspx

If you have had experience moving from one of these platforms to the other, please share your experiences.

Posted in Business Intelligence, SQL Server | Leave a Comment »

PowerPivot

Posted by decipherinfosys on December 23, 2009

If you have dealt with a business intelligence project for the financial industry, chances are that you must have met business/application users who know excel in and out and can do pretty nice business analytics using excel itself.  MSFT has now introduced PowerPivot for Excel 2010 which is essentially a data analysis add-in.  This used to be called as Project Gemini in the past for those of you who have been tracking it in the past.   Pretty cool and very powerful indeed.  If you are new to it, there is an excellent introduction to powerpivot by Rob Collie – here.

Here are some links that you can use for exploring it further:

Main Site: http://powerpivot.com/

MSDN Blog site: http://blogs.msdn.com/powerpivot/

Posted in Business Intelligence | Leave a Comment »

FreeStyle, anyone?

Posted by decipherinfosys on June 12, 2009

No, we are not talking about freestyle wrestling here 🙂  We are talking about Coke’s new RFID based freestyle dispensers.  The new dispensers will not only provide the consumers with more than 100 beverage choices, the real benefit of it is the business intelligence data that Coke will be able to gather based on the consumption of thosse beverages.  There are many more data points of interests as well – you can read more on it in the information week article here.

Posted in Business Intelligence, News, Technology | Leave a Comment »

SQL Server 2008 Analysis Services Performance Guide

Posted by decipherinfosys on February 9, 2009

MSFT has released a very good whitepaper on Analysis Service performance issues.  It covers Query Performance issues, Processing Performance issues and tuning of the server resources.  You can download it from here.

Posted in Business Intelligence, SQL Server | Leave a Comment »

End of the road for Performance Point Server

Posted by decipherinfosys on January 28, 2009

It’s curtains for the performance point server product.  It’s functionality will now be merged into Sharepoint server instead.  MSFT announced it last week – you can read more on this over here.  This BI product was cumbersome to use and many of our customers also were just happy using the feature functionality present within SQL Server, Sharepoint and for the accounting folks – Excel.  Having it’s functionality tightly integrated within Sharepoint Server will be a much better choice and hopefully will see a wider adoption.

Posted in Business Intelligence | 2 Comments »

Performance Point Server (PPS)

Posted by decipherinfosys on May 2, 2008

While working on PPS at a client site yesterday, I got stuck and looked up two blogs that I refer for PPS work(in addition to MSDN, TechNet) – thought that I would post the links here so that you can also benefit from them:

Microsoft PPS Team Blog

Norm’s PPS Blog

Posted in Business Intelligence, SQL Server | Leave a Comment »

Factless Fact Tables

Posted by decipherinfosys on April 7, 2008

There is a term that is used in dimensional modeling which is called Factless Fact Tables. Sounds like an oxymoron since what use is a fact table if it is not recording any facts? Factless fact tables are those that fact tables that consist of only keys i.e. they do not have any measures in them but only keys pointing to the different dimensions. These tables essentially capture the M:N (many to many) relationship between the different dimensions and do not contain any numeric or text facts in them. We have seen two practical scenarios for the usage of factless fact tables:

1) A common use of such tables is to track events. Taking the example of the medical industry (the same one that we did in our previous example of dimensional modeling), when tracking insurance related events, we had to create a factless fact table in order to track the different events which covered the M:N relationships between the different dimensions in the system.

2) The second scenario is called a coverage table.  Let’s consider a retail store for this example.  If we are tracking the sales of different products in the store, we will more than likely have these dimensions:

Date: To track the dates for the sale.

Store: To track which store and location the sale was made at.

Product: To track which product the sale was for.

Promotion: Any promotions that are put into place to enhance the sales.

And then in the middle of it, we will have our SALES_FACT table which will record the dollar amount, the units sold etc. in addition to the keys to our dimension tables.  So, for each sale that is made, all this information will be recorded and will be made available for analysis.  But what about the products that were put on a promotion but did not sell?

There are two ways of handling this: a) create records in the SALES_FACT table with 0 as the sales for those products under that time duration for a store during that promotion.  Though this is logically valid, this will bloat the table data my many folds since we will be creating entries into the fact table for all the products for that day in a given store during the promotion scheme.  A coverage factless fact table comes to the rescue under such a situation.  One can create such a fact table with only the 4 keys that point to the dimensions mentioned above.  Having such a table allows us to record each product in each store that is on a promotion in a given promotion time frame.   This complex M:N relationship is captured by making use of the factless fact table.  Now, going back to the question: “But what about the products that were put on a promotion but did not sell?“, we can get that answer by querying the coverage factless fact table to get the products that were on promotion on a given day in that store and then remove the products that were recorded in the SALES_FACT table for that store for that day since it recorded the products that did sell.

Posted in Business Intelligence | Leave a Comment »

Dimensional Modeling

Posted by decipherinfosys on March 31, 2008

Anyone who has designed and implemented a business intelligence system knows that one of the most important thing when doing dimensional modeling is the declaration of the grain of the fact table.  What that means is that the grain should be something that represents exactly what a record within the fact table is supposed to represent.  If this is not done properly, you will not be able to roll up a lot of information and your analysis will suffer due to the bad design.  Let’s take an example to illustrate this:

We will pick an example from the Medical industry since that was one of our most recent projects.  When working on implementing a BI solution for the medical practices and small hospitals, one of the items in the billing/coding departments was the line items in the bill for the services rendered by the practice.  These were the potential dimensions for such a line item:

  1. Provider (which particular doctor or a nurse practitioner who did the work),
  2. Location (since the practice can have multiple locations),
  3. Date (of examination),
  4. Patient,
  5. Diagnosis,
  6. Procedure,
  7. Billing Organization (the practice),
  8. Responsible Party (either parent/guardian or the patients themselves),
  9. Payer (though this can be further divided into Primary Payer and Secondary Payer)

One key thing to note is that the grain for this fact table for which we have listed the potential dimensions above is a line item on the bill that is received from the medical practice.  So, the grain always needs to be very atomic – the smaller and more atomic the measures are for the fact table, the more things you can infer from such a design since it will give rise to a lot of dimensions across which you can do your analysis.  Now suppose that instead of choosing the grain to be the line item on the bill, we choose the grain to be more coarse i.e. aggregations of the atomic data.  Such an example would be to choose a grain to be the “Total monetary amount charged for a given procedure based on a certain diagnosis done per month”.  If we choose such a grain, then we will end up with very less dimensions.  In the example above, we will potentially have only these dimensions:

  1. Provider (the doctor or the nurse practitioner who did the work),
  2. Procedure,
  3. Diagnosis,
  4. Month

We have taken the extreme case to make the point of choosing the grain judiciously.  As you can see from above, the number of dimensions have reduced in number and a lot of atomic information (and thus those dimensions) are lost.  It would have made no sense to include all of those 9 dimensions that we had listed before since even if we included them, we would have ended up with very little aggregation since the fact table’s grain is not atomic enough to be able to make sense for those dimensions that represent the atomic data.

Understand that aggregations shrink the dimensions since they have less dimensional detail.  As long as one designs the fact tables with a grain that is very atomic (cannot be further divided up into smaller section), it is always possible to roll up and present the aggregated information with a lot more control over how to slice and dice the data to present other interesting facts.  In the example above, one such measured fact would be the amount that was billed for that one particular procedural diagnosis.  One can then analyze this data by rolling it up by patient, by procedural diagnosis, by provider or by combinations like amount billed per month by procedure by provider by location.  As long as one has the atomic data, none of the information gets lost.

Posted in Business Intelligence, Data Model | 1 Comment »