Introduction to Hadoop and MapReduce Notes

What isbig data - a subjective term but mostly large amount of data that is usually difficult to be processed on a small machine not necessarily large amounts of data.
Challenges with data are that data comes in really fast and from multiple places.
The three V's  Volume, Variety, Velocity

When to use HBase and when to use Hive - Stack OverflowApache Flume – Architecture of Flume NG | Cloudera Developer Blog
CDH - distribution of Apache Hadoop and related projects.Hadoop Streaming
Hadoop Storing format.
Introducing Parquet: Efficient Columnar Storage for Apache Hadoop | Cloudera Developer Blog
hadoop - Storage format in HDFS - Stack Overflow

NameNode MapReduceShuffle and SortApache SpoopApache NutchThe Final
Much of this information below is on a Google doc that was some what hidden in the course wiki but not provided on the final's instructions. The doc can also be found in the forms for the class but rather then simply reference the class I wa…

Sunyit's Project BITS Documentations

The following is a collection of documents that I created solely for myself and colleagues in order to meet standards for implementing a Hadoop cloud service. That said there is a lot of information that is specific for the systems used and customized to only work for those who were apart the project. The objective for Project "Bits" can be found here in this link. All ip addresses have been marked with x's and urls generalized in order to protect the SunyIT network system. I continue to study the systems used here and release the documents in hope that others might take up the project and implement it at his or her's University/College.

Back-Bone of Bits ProjectThis is the server BitsGW which features a vpn connection across multiple colleges. Creating VM’s of BitsHP (hadoop machines) to have a scalable new projects. Also providing a LDAP connection service.BitsGW has the following user: afassett, adminBitsHP: Pxe server for machines behind it for Hadoop nodes. Whi…