Excellent comparison of Pig vs Hive
Posted by decipherinfosys on March 22, 2012
We had recently blogged about Hadoop and the different sources for learning Hadoop and getting up to speed on it. One of the points that we missed out on was a mention of Pig and Hive. Hive and Pig were Hadoop sub-projects before but are now open source volunteer projects under the Apache Software Foundation.
Pig is essentially a platform for creating MapReduce programs with Hadoop. The platform consists of a high level language for data analysis programs and an infrastructure for evaluating those programs. Since they are amenable to substantial parallel operations, it enables them to handle very large data sets.
Hive is a data warehouse system built for Hadoop that allows easy data aggregation, ad-hoc queries and analysis of large data sets stored in Hadoop compatible file systems. HiveQL is a SQL “like” language that can be used to interact with the data and it also allows developers to put in their own custom mappers/reducers.
Here is a link that provides an excellent comparison between Pig and Hive by Lars George:
Be sure to read the comments as well.
And the getting started guides on Hive and Pig: