Hive Setup

Hive Setup is quite straightforward could be achieved by following just 1,2,3 steps. Below is brief description how to setup Hive. Continue reading “Hive Setup” →

Hadoop HDFS JAVA API

Hadoop’s org.apache.hadoop.fs.FileSystem is generic class to access and manage HDFS files/directories located in distributed environment. File’s content stored inside datanode with multiple equal large sizes of blocks (e.g. 64 MB), and namenode keep the information of those blocks and Meta information. FileSystem read and stream by accessing blocks in sequence order. FileSystem first get blocks information from NameNode then open, read and close one by one. It opens first blocks once it complete then close and open next block. HDFS replicate the block to give higher reliability and scalability and if client is one of the datanode then it tries to access block locally if fail then move to other cluster datanode. Continue reading “Hadoop HDFS JAVA API” →

Hadoop MapReduce

Introduction
HDFS stores file in multiple equal large size block e.g. 64 MB, 128 MB etc. and MapReduce framework access and process these files in distributed environment.
The MapReduce framework works on key-value pairs, it has two key part Mapper and Reducer.Map Reducers read file and split and pass to Mapper. Mapper set the input as key-value pairs and pass to the intermediate for sorting and shuffling. Reducer takes the key and list of value, process and writes to the disk. Continue reading “Hadoop MapReduce” →