In a recently concluded project, we had the opportunity to work on Hadoop. There was a learning curve since none of us had worked in Hadoop before. Here are some URLs to help you get started with your learning process in this regard:
Basics of Hadoop:
The article on gigaom or the series of articles on cloudera’s site will get your started:
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/
http://www.cloudera.com/what-is-hadoop/
Sign up with Cloudera and you will have access to a lot of very good learning material on Hadoop, example:
http://www.cloudera.com/resource/introduction-to-apache-mapreduce-and-hdfs/ is a good starter’s video on MapReduce and HDFS.
or this one: http://www.cloudera.com/resource/apache-hadoop-ecosystem/ for understanding the Hadoop ecosystem.
And this whitepaper from Gartner on Hadoop and MapReduce for Big Data Analytics:
http://info.cloudera.com/GartnerReportHadoopJanuary2011.html
If you like to have text available for your learning purposes rather than video tutorials, here is a good chapter on HDFS: http://www.aosabook.org/en/hdfs.html
Setting up Hadoop cluster:
And once you are ready to jump in, there are some excellent tutorials by Michael G. Noll to guide you:
To set up your first Hadoop node: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
And then multiple node cluster: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
And here are some additional good tutorial references: http://www.delicious.com/jhofman/tutorials+hadoop
Microsoft and BigData
Recently, MSFT also announced their support for Apache Hadoop. You can read more on MSFT’s big data solution from here:
http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data-solution.aspx
and the work done by HortonWorks for extending Apache Hadoop to Windows:
http://hortonworks.com/blog/extending-apache-hadoop-to-millions-of-new-microsoft-users/