Sqoop – the Big Data Tool has moved out of the Apache Incubator to a Top Level Project (TLP). In case you are not aware of Sqoop, it is the key data tool to transfer volumes of data between Hadoop and structured data stores like RDBMS (Relational Database Management Systems). This project provides connectors for many popular RDBMS – Oracle, SQL Server, MySQL, DB2 and PostgreSQL. This is a significant step towards the adoption of Hadoop in the enterprise solutions.
Posts Tagged ‘Hadoop’
Posted by decipherinfosys on April 5, 2012
Posted by decipherinfosys on March 22, 2012
We had recently blogged about Hadoop and the different sources for learning Hadoop and getting up to speed on it. One of the points that we missed out on was a mention of Pig and Hive. Hive and Pig were Hadoop sub-projects before but are now open source volunteer projects under the Apache Software Foundation.
Pig is essentially a platform for creating MapReduce programs with Hadoop. The platform consists of a high level language for data analysis programs and an infrastructure for evaluating those programs. Since they are amenable to substantial parallel operations, it enables them to handle very large data sets.
Hive is a data warehouse system built for Hadoop that allows easy data aggregation, ad-hoc queries and analysis of large data sets stored in Hadoop compatible file systems. HiveQL is a SQL “like” language that can be used to interact with the data and it also allows developers to put in their own custom mappers/reducers.
Here is a link that provides an excellent comparison between Pig and Hive by Lars George:
Be sure to read the comments as well.
And the getting started guides on Hive and Pig: