Hadoop 2.x is based on YARN architecture, which uses ResourceManagaer and ApplicationManager. ResourceManagaer manage recourses across cluster and Application Manager manages job life cycles.
Installing Hadoop is quite simple what we need to do to just Untar the Hadoop tar on the cluster nodes.
Master nodes will take responsibility of NameNode and ResourceManager whereas slaves clusters will take up the responsibility of DataNode and NodeManager. NameNode and ResourceManager could be different nodes.
Below explain step by step how to setup Hadoop 2.x on a single-node cluster.
Prerequisites:
• Java 6 installed
• Dedicated user for Hadoop
• SSH configured
Platform:
We are using MacOS however we could also follow the same steps to install in Linux. If we need to install on Window, we have to install Cygwin to support shell.
Download
• Download tar from link http://hadoop.apache.org/releases.html
• Extract into /Application/hadoop-2.3.0
Setup Environment
1. $ export HADOOP_HOME=/Application/hadoop-2.3.0
2. export PATH=$HADOOP_HOME/bin:$PATH
3. $export PATH=$HADOOP_HOME/sbin:$PATH
Note: We could also add above command bash profile to avoid repeating above steps
Create directories
Create namenode and datanode directory as per below
$ mkdir -p $ HADOOP_HOME/data/hdfs/namenode
$ mkdir -p $ HADOOP_HOME/data/hdfs/datanode
Change in yarn-site.xml
Change in /Application/hadoop-2.3.0/etc/hadoop/yarn-site.xml as below
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
Change in core-site.xml
Change in /Application/hadoop-2.3.0/etc/hadoop/core-site.xml
<!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000/</value> </property> </configuration> Change in hdfs-site.xml Change in /Application/hadoop-2.3.0/etc /hadoop/hdfs-site.xml: <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/ Application /hadoop-2.3.0/data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/ Application /hadoop-2.3.0/data/hdfs/datanode</value> </property> </configuration>
Change in mapred-site.xml
Change in / Application /hadoop-2.3.0/etc/hadoop/mapred-site.xml. If it is not available then create one.
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
Logging
Update the conf/log4j.properties file to customize the Hadoop logging configuration.
Hadoop uses the Apache log4j via the Apache Commons Logging framework.
Format namenode
$ hadoop namenode -format You will be getting message as below /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at username.local/xx.yyy.zz.aa ************************************************************/ Start HDFS server run jps command by $jps 782 912 Jps
Means as of now HDFS namenode has not been started
Start namenode
$sh hadoop-daemon.sh start namenode
Start datanode
sh hadoop-daemon.sh start datanode $ jps 1305 Jps 1238 DataNode 1201 NameNode
Resource Manager
$ sh yarn-daemon.sh start resourcemanager
Node Manager:
$ sh yarn-daemon.sh start nodemanager
Job History Server:
$ sh mr-jobhistory-daemon.sh start historyserver
Web interface
Browse HDFS and check health using http://localhost:50070 in the browser:
Reference
Hadoop Essence: The Beginner's Guide to Hadoop & Hive
Reblogged this on HadoopEssentials.
your given information for already knows little bit about hadoop configuration. Please consider the bignners
Is there any specific setup configuration for Cloudera or HDP framework?
Thank you.