Systems Engineering and RDBMS

Excellent comparison of Pig vs Hive

Posted by decipherinfosys on March 22, 2012

We had recently blogged about Hadoop and the different sources for learning Hadoop and getting up to speed on it.  One of the points that we missed out on was a mention of Pig and Hive.  Hive and Pig were Hadoop sub-projects before but are now open source volunteer projects under the Apache Software Foundation.

Pig is essentially a platform for creating MapReduce programs with Hadoop.  The platform consists of a high level language for data analysis programs and an infrastructure for evaluating those programs.  Since they are amenable to substantial parallel operations, it enables them to handle very large data sets.

Hive is a data warehouse system built for Hadoop that allows easy data aggregation, ad-hoc queries and analysis of large data sets stored in Hadoop compatible file systems.  HiveQL is a SQL “like” language that can be used to interact with the data and it also allows developers to put in their own custom mappers/reducers.

Here is a link that provides an excellent comparison between Pig and Hive by Lars George:

http://www.larsgeorge.com/2009/10/hive-vs-pig.html

Be sure to read the comments as well.

And the getting started guides on Hive and Pig:

Hive: https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Pig: http://pig.apache.org/docs/r0.9.2/start.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: