Systems Engineering and RDBMS

Sqoop graduates from Apache Incubator to a Top Level Project

Posted by decipherinfosys on April 5, 2012

Sqoop – the Big Data Tool has moved out of the Apache Incubator to a Top Level Project (TLP).  In case you are not aware of Sqoop, it is the key data tool to transfer volumes of data between Hadoop and structured data stores like RDBMS (Relational Database Management Systems).  This project provides connectors for many popular RDBMS – Oracle, SQL Server, MySQL, DB2 and PostgreSQL.  This is a significant step towards the adoption of Hadoop in the enterprise solutions.

You can read more on this in the eweek article here. And here is the link for learning more on sqoop: http://sqoop.apache.org/

Posted in Big Data, Technology | Tagged: , | Leave a Comment »

Bug: Scope_Identity() and @@Identity can return wrong values during parallel execution plans

Posted by decipherinfosys on April 4, 2012

Ran into a weird issue today at a client site and upon further researching it, found out that it is a known issue in the SQL Server version that was being used at the client site.  The issue was that at times, we were getting wrong values returned by the Scope_Identity() function.  Upon further diagnosis, we found that the query in question was using a parallel execution plan and when that happens, there is a known issue in some versions of SQL Server because of which you can get wrong Identity value returned.  The KB article from MSFT that talks about this is available here:

http://support.microsoft.com/default.aspx?scid=kb;en-US;2019779

It states that the issue is fixed as part of the Cumulative Update Package 5 of SQL Server 2008 R2 SP1 and any new releases post it.  However, this was a production system and getting any such kind of patch update/upgrade would have taken time so we adopted the workaround of using the OUTPUT clause.   Example of the OUTPUT clause usage is in that KB article and you can also read up more on it in our blog post here and here.  You can also opt for Method 2 mentioned in the workaround in the article.  Method 1 and Method 3 would be sub-optimal for your scenario if the parallel plan was desired to begin with with Method 3 being applied at the Server level and hence effecting all queries.

Posted in SQL Server | Leave a Comment »

Online Index Operations involving certain large data types are allowed in SQL Server 2012

Posted by decipherinfosys on April 3, 2012

Prior to SQL Server 2012, when you did online index operations (available in Enterprise Edition), if the index contained certain large data types (specifically varchar(max), nvarchar(max), varbinary(max)), then those online operations were not allowed on that index.  That restriction does not exist anymore in SQL Server 2012 so there is no need to take such indexes out of your online operations.

Example:  Let’s create a sample table which contains two columns – an ID column of type INT and the second column of data type NVARCHAR(MAX) and then let’s try to create an index on it with ONLINE option set to ON.

CREATE TABLE tblTestOnlineIdx
(
ID INT IDENTITY NOT NULL
, COL1 NVARCHAR(MAX) NOT NULL
, COL2 XML NOT NULL
)
GO
CREATE INDEX tblTestOnlineIdx_IND_1 on tblTestOnlineIdx (ID) INCLUDE (COL1) WITH (ONLINE = ON)
GO

We will see that we will get an error in SQL Server 2008 telling us that:

Msg 2725, Level 16, State 2, Line 1
An online operation cannot be performed for index ‘tblTestOnlineIdx_IND_1′ because the index contains column ‘COL1′ of data type text, ntext, image, varchar(max), nvarchar(max), varbinary(max), xml, or large CLR type. For a non-clustered index, the column could be an include column of the index. For a clustered index, the column could be any column of the table. If DROP_EXISTING is used, the column could be part of a new or old index. The operation must be performed offline.

And this same code will run fine on SQL Server 2012 because this limitation has been removed.  However, even though the BOL link above states that the same is true for indexes containing a XML data type column, that is not the case.  ONLINE=ON is not a valid option for the XML indexes and it will still give an error.  That is reflected in the BOL link here which covers the creation of XML indexes.

Posted in SQL Server | Tagged: , | Leave a Comment »

Editions in SQL Server 2012 and Licensing

Posted by decipherinfosys on March 27, 2012

SQL Server 2012 is available in these editions:

  • Enterprise Edition
  • Standard Edition
  • Business Intelligence Edition
  • Developer Edition
  • Web Edition, and
  • Express Edition

Out of these, the Enterprise Edition is of course the one that has all the high end features for meeting the needs of a Tier-1 large scale deployment.  You would recall that previously there was a DataCenter Edition available – that is no longer available now and all of it’s capabilities are now included in the Enterprise Edition itself.

The Business Intelligence Edition is a new Edition that is offered now.   This is available to offer clients the full suite of powerful BI features in SQL Server 2012 like Power View, PowerPivot, enhanced reporting and analytics.  Any company specifically looking at SQL Server from the perspective of BI and not interested in the full OLTP features that are found in the Enterprise Edition would opt for this one.

The Standard Edition is aimed towards customers who do not need the high end features that are available in the Enterprise or the Business Intelligence Editions.

Out of the remaining three specialized Editions, the Developer Edition (like in previous versions) has all the features of the Enterprise Edition but is meant only to be used in a Dev/QA environment and does allow for a direct upgrade to the Enterprise Edition.  The Web Edition is essentially intended for the hosting service providers.  It is scaled down version in terms of support for processors (max: 4) and the feature sets but is good enough to be used if you are looking for an edition to host web-sites and small scale web applications.  It supports up-to 64GB of memory.  The Express Edition is a free edition and has restrictions on the DB size (max: 10GB), 1 processor and 1GB of memory.

Here are some links comparing the different feature sets across the different editions available in SQL Server 2012:

SQL Server 2012 Editions: http://www.microsoft.com/sqlserver/en/us/sql-2012-editions.aspx

Article on MSDN with the Feature Comparison between the different Editions: http://msdn.microsoft.com/en-us/library/cc645993

And here is the information from MSFT in regards to licensing: http://www.microsoft.com/sqlserver/en/us/get-sql-server/licensing.aspx

Posted in SQL Server | Tagged: , , | Leave a Comment »

Virtual Labs for SQL Server 2012

Posted by decipherinfosys on March 23, 2012

MSFT has released a bunch of virtual labs to easily come up to speed with the new features of SQL Server 2012.  So, if you are restrained due to hardware/software and are finding it difficult to convince your boss to provide environments to play around with the new feature sets, here is a great place to get your hands dirty and start learning more about SQL Server 2012 – all these virtual labs run off the MSFT servers in the cloud.

http://www.microsoft.com/sqlserver/en/us/learning-center/virtual-labs.aspx

Posted in SQL Server | Tagged: , | Leave a Comment »

Sample Databases for SQL Server 2012

Posted by decipherinfosys on March 22, 2012

In case you are looking for the AdventureWorks Sample Databases for SQL Server 2012 version, you can download them from codeplex – here is the link for all the downloads (Data files, Script files etc.):

http://msftdbprodsamples.codeplex.com/releases/view/55330

Once you download the “AdventureWorks2012_Data” data file, you can simply attach it using T-SQL or through the GUI.  If you use T-SQL, you can do simply:

USE MASTER
GO
CREATE DATABASE AdventureWorks2012
ON (FILENAME = 'C:\AdventureWorks2012_Data.mdf')
FOR ATTACH_REBUILD_LOG

It will automatically create the log file in the same folder.  And if you want to do it from the GUI, you can right click on databases in Management Studio and Select “Attach” which will bring up the GUI in which you can select the MDF file location.  You will notice that under the details section, it will complain about not finding the Log file.  If you proceed as is, you will get an error since it is expecting the log file to be present.  Simply highlight the log entry line in the details section and click on Remove and then proceed and it will create the log file for you and attach the database:

SQL Server 2012 - Attach Database

Posted in SQL Server | Tagged: , , | Leave a Comment »

Excellent comparison of Pig vs Hive

Posted by decipherinfosys on March 22, 2012

We had recently blogged about Hadoop and the different sources for learning Hadoop and getting up to speed on it.  One of the points that we missed out on was a mention of Pig and Hive.  Hive and Pig were Hadoop sub-projects before but are now open source volunteer projects under the Apache Software Foundation.

Pig is essentially a platform for creating MapReduce programs with Hadoop.  The platform consists of a high level language for data analysis programs and an infrastructure for evaluating those programs.  Since they are amenable to substantial parallel operations, it enables them to handle very large data sets.

Hive is a data warehouse system built for Hadoop that allows easy data aggregation, ad-hoc queries and analysis of large data sets stored in Hadoop compatible file systems.  HiveQL is a SQL “like” language that can be used to interact with the data and it also allows developers to put in their own custom mappers/reducers.

Here is a link that provides an excellent comparison between Pig and Hive by Lars George:

http://www.larsgeorge.com/2009/10/hive-vs-pig.html

Be sure to read the comments as well.

And the getting started guides on Hive and Pig:

Hive: https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Pig: http://pig.apache.org/docs/r0.9.2/start.html

Posted in Big Data | Tagged: , , , | Leave a Comment »

New way of paging in SQL Server 2012 – Offset clause

Posted by decipherinfosys on March 21, 2012

For any web application, paging of records is a necessary requirement.  We have written several posts in the past on how to achieve paging in Oracle, SQL Server and DB2 LUW.  Those posts showed how paging can be achieved using CTE/custom procedural code/row_number() function (SQL Server), using rownum (Oracle) or rownumber()/fetch first clause (DB2).  Here are some of those posts to familiarize yourself with the different ways of paging in different RDBMS:

Whitepaper on paging in Oracle, SQL Server and DB2 LUW (bit dated – wrote it in 2007): http://decipherinfosys.wordpress.com/2007/01/18/paging-data-in-oracle-ms-sql-server-and-db2-luw-using-database-code/

Paging of Record Sets and COUNT(STOPKEY) Optimization in Oracle: http://decipherinfosys.wordpress.com/2007/08/09/paging-and-countstopkey-optimization/

Paging of record sets in SQL Server 2005 or SQL Server 2008: http://decipherinfosys.wordpress.com/2007/08/10/paging-of-record-sets-in-sql-server-2005/

Dynamic SQL, Dynamic Ordering and Paging: http://decipherinfosys.wordpress.com/2008/04/22/dynamic-sql-dynamic-ordering-and-paging/

In today’s post, we will introduce a new clause that is now available in SQL Server 2012 which makes writing the paging logic a lot more simpler to read and implement.  Here is an example:


DECLARE
@NbrofRecords INT = 15,
@PageNbr INT = 3,
@OrgTypeID INT = 23,
@OrgName NVARCHAR(100) = NULL;


SELECT OM.ORG_MASTER_ID, OMTX.ORG_TYPE_LKUP_ID, OTL.ORG_TYPE_NAME
, COALESCE (MFFM.FUND_FAMILY_NAME, OM.ORG_NAME) AS ORG_NAME, MFFM.MUTUAL_FUND_FAMILY_MASTER_ID
, COALESCE (CM.FIRST_NAME, ' ') + COALESCE (CM.MIDDLE_NAME, ' ') + COALESCE (CM.LAST_NAME, ' ') AS CONTACT_NAME
, CM.CONTACT_MASTER_ID FROM dbo.ORG_MASTER AS OM
INNER JOIN dbo.ORG_MASTER_TYPE_XREF AS OMTX
ON OM.ORG_MASTER_ID = OMTX.ORG_MASTER_ID
INNER JOIN dbo.ORG_TYPE_LKUP AS OTL
ON OMTX.ORG_TYPE_LKUP_ID = OTL.ORG_TYPE_LKUP_ID
LEFT OUTER JOIN dbo.ORG_TYPE_CONTACT_XREF AS OTCX
ON OMTX.ORG_MASTER_ID = OTCX.ORG_MASTER_ID
AND OMTX.ORG_TYPE_LKUP_ID = OTCX.ORG_TYPE_LKUP_ID
LEFT OUTER JOIN dbo.CONTACT_MASTER AS CM
ON OTCX.CONTACT_MASTER_ID = CM.CONTACT_MASTER_ID
LEFT OUTER JOIN dbo.MUTUAL_FUND_FAMILY_MASTER AS MFFM
ON OM.ORG_MASTER_ID = MFFM.ORG_MASTER_ID
WHERE OM.MARK_FOR_DELETION = 0
AND OTL.ORG_TYPE_LKUP_ID = @OrgTypeID
AND (OM.ORG_NAME LIKE @OrgName OR MFFM.FUND_FAMILY_NAME LIKE @OrgName OR @OrgName IS NULL)
ORDER BY OM.ORG_NAME ASC
OFFSET (@NbrofRecords * (@PageNbr - 1)) ROWS
FETCH NEXT @NbrofRecords ROWS ONLY;
GO

As you can see from above, the only change was to append the OFFSET clause and the FETCH NEXT to your actual query and use two parameters – one to denote the Page Number that you want and the other to denote how many records do you want per page.  So, say your query returns 1000 records and your per page display of records requirement is 15 records per page and you want to display the data from the third page – in that scenario, @PageNbr will have a value of 3 and @NbrofRecords will have a value of 15 so the offset would be by 15 * (3-1) = 30 i.e. you are going to start from the 31st record using the OFFSET clause and then FETCH the NEXT 15 records.  Pretty easy to write, understand and implement, isn’t it?  Do remember that these clauses go along with the ORDER BY clause i.e. you cannot use these on unordered data (will get an error) – if you want data ordered in a particular way, you should be using order by anyways which is always the case for paging for web applications.

Will need to play around a bit to ascertain the performance impact in the case of complex queries with different columns (from different tables) in the order by clause or conditional ordering to see what the impact is between the old vs the new approach.  Will post the results on the blog when we do those tests.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Learning Hadoop

Posted by decipherinfosys on March 20, 2012

In a recently concluded project, we had the opportunity to work on Hadoop.  There was a learning curve since none of us had worked in Hadoop before.  Here are some URLs to help you get started with your learning process in this regard:

Basics of Hadoop:

The article on gigaom or the series of articles on cloudera’s site will get your started:

http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/

http://www.cloudera.com/what-is-hadoop/

Sign up with Cloudera and you will have access to a lot of very good learning material on Hadoop, example:

http://www.cloudera.com/resource/introduction-to-apache-mapreduce-and-hdfs/  is a good starter’s video on MapReduce and HDFS.

or this one: http://www.cloudera.com/resource/apache-hadoop-ecosystem/ for understanding the Hadoop ecosystem.

And this whitepaper from Gartner on Hadoop and MapReduce for Big Data Analytics:

http://info.cloudera.com/GartnerReportHadoopJanuary2011.html

If you like to have text available for your learning purposes rather than video tutorials, here is a good chapter on HDFS: http://www.aosabook.org/en/hdfs.html

Setting up Hadoop cluster:

And once you are ready to jump in, there are some excellent tutorials by Michael G. Noll to guide you:

To set up your first Hadoop node: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

And then multiple node cluster: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

And here are some additional good tutorial references: http://www.delicious.com/jhofman/tutorials+hadoop

Microsoft and BigData

Recently, MSFT also announced their support for Apache Hadoop.  You can read more on MSFT’s big data solution from here:

http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx

and the work done by HortonWorks for extending Apache Hadoop to Windows:

http://hortonworks.com/blog/extending-apache-hadoop-to-millions-of-new-microsoft-users/

Posted in Big Data, Linux, SQL Server, Unix, Windows | 1 Comment »

Installing/Enabling .Net 3.5 SP1 on Windows 2008 R2 for SQL Server 2008 R2/SQL Server 2012 RC0 Install

Posted by decipherinfosys on February 24, 2012

A junior team member ran into this issue so thought it was worth a post.  While installing SQL Server 2012 RC0 on a new VM of Windows Server 2008 R2, it complained about installing or enabling .Net Framework 3.5 SP1 which is a pre-requisite for the SQL Server install.  Here are the steps to validate whether it is installed and how to enable it and if it is not installed, how to go about installing it.

  1. Under Administrative Tools, Select Server Manager and click on Features.
  2. All the installed Features are displayed in the right side pane where you can validate if .Net Framework 3.5 SP1 (3.5.1) is installed or not.
  3. If it is not installed, then click on “Add Features” and expand “.Net Framework 3.5.1 Features”.
  4. Check the check box for “.Net Framework 3.5.1 Features” and click on Next and then Install.

5.  Once the installation is complete, click on close and you are done.

Now, you can move on with your SQL Server 2012 RC0 install on Windows Server 2008 R2.  Have fun.

Posted in .Net Development, SQL Server, Windows | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 39 other followers