wangshunyao

HADOOP 环境搭建

 

第一步:配置ssh hadoop 无密码登陆,java环境

第二步:下载hadoop包解压到目标文件夹hadoop-1.6

第三步:创建hadoop用户操作hadoop-1.6

第四步:配置hadoop的环境

 

1、解压 tar -xzvf hadoop-2.6.0.tar.gz 
2、move到指定目录下:[spark@HADOOP14 software]$ mv hadoop-2.6.0 ~/opt/ 
3、进入hadoop目前  [spark@HADOOP14 opt]$ cd hadoop-2.6.0/
[spark@HADOOP14 hadoop-2.6.0]$ ls ##查看的hadoop中的所有内容
bin  dfs  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp
配置之前,先在本地文件系统创建以下文件夹:~/hadoop/tmp、~/dfs/data、~/dfs/name。 主要涉及的配置文件有7个:都在/hadoop/etc/hadoop文件夹下,可以用vim命令对其进行编辑。

  1. ~/hadoop/etc/hadoop/hadoop-env.sh
  2. ~/hadoop/etc/hadoop/yarn-env.sh
  3. ~/hadoop/etc/hadoop/slaves
  4. ~/hadoop/etc/hadoop/core-site.xml
  5. ~/hadoop/etc/hadoop/hdfs-site.xml
  6. ~/hadoop/etc/hadoop/mapred-site.xml
  7. ~/hadoop/etc/hadoop/yarn-site.xml


4、进去hadoop配置文件目录

  1. [spark@HADOOP14 hadoop-2.6.0]$ cd etc/hadoop/
  2. [spark@HADOOP14 hadoop]$ ls
  3. capacity-scheduler.xml  hadoop-env.sh               httpfs-env.sh            kms-env.sh            mapred-env.sh               ssl-client.xml.example
  4. configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  kms-log4j.properties  mapred-queues.xml.template  ssl-server.xml.example
  5. container-executor.cfg  hadoop-metrics.properties   httpfs-signature.secret  kms-site.xml          mapred-site.xml             yarn-env.cmd
  6. core-site.xml           hadoop-policy.xml           httpfs-site.xml          log4j.properties      mapred-site.xml.template    yarn-env.sh
  7. hadoop-env.cmd          hdfs-site.xml               kms-acls.xml             mapred-env.cmd        slaves                      yarn-site.xml

 


4.1、配置 hadoop-env.sh文件-->修改JAVA_HOME

  1. # The java implementation to use.
  2. export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37

 


4.2、配置 yarn-env.sh 文件-->>修改JAVA_HOME

  1. # some Java parameters
  2. export JAVA_HOME=/home/spark/opt/java/jdk1.6.0_37

 


4.3、配置slaves文件-->>增加slave节点 

  1. HADOOP16   #hosts 中配置的另一台机器名

 


4.4、配置 core-site.xml文件-->>增加hadoop核心配置(hdfs文件端口是9000、file:/home/spark/opt/hadoop-2.6.0/tmp、HADOOP14是本机机器名《不用改变,只需要在hosts中配置》)

<configuration>
 <property>
  <name>fs.defaultFS</name>
  <value>hdfs://HADOOP14:9000</value>
 </property>

 <property>
  <name>io.file.buffer.size</name>
  <value>131072</value>
 </property>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>file:/home/spark/opt/hadoop-2.6.0/tmp</value>
  <description>Abasefor other temporary directories.</description>
 </property>
 <property>
  <name>hadoop.proxyuser.spark.hosts</name>
  <value>*</value>
 </property>
<property>
  <name>hadoop.proxyuser.spark.groups</name>
  <value>*</value>
 </property>
</configuration>

 


4.5、配置  hdfs-site.xml 文件-->>增加hdfs配置信息(namenode、datanode端口和目录位置)

<configuration>
 <property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>HADOOP14:9001</value>
 </property>

  <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/spark/opt/hadoop-2.6.0/dfs/name</value>
 </property>

 <property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/home/spark/opt/hadoop-2.6.0/dfs/data</value>
  </property>

 <property>
  <name>dfs.replication</name>
  <value>3</value>
 </property>

 <property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
 </property>

</configuration> 


4.6、配置  mapred-site.xml 文件-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)如果没有该文件,只有一mapred-site.xml.template 可以在该文件中修改,在copy一份到同级目录

<configuration>
  <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
 <property>
  <name>mapreduce.jobhistory.address</name>
  <value>HADOOP14:10020</value>
 </property>
 <property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>HADOOP14:19888</value>
 </property>
</configuration> 


4.7、配置   yarn-site.xml  文件-->>增加yarn功能

 <configuration>
  <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
  </property>
  <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
   <name>yarn.resourcemanager.address</name>
   <value>HADOOP14:8032</value>
  </property>
  <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>HADOOP14:8030</value>
  </property>
  <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>HADOOP14:8035</value>
  </property>
  <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>HADOOP14:8033</value>
  </property>
  <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>HADOOP14:8088</value>
  </property>

</configuration>


5、将配置好的hadoop文件copy到另一台slave机器上

[spark@HADOOP14 opt]$ scp -r hadoop-2.6.0/  spark@10.126.34.43:~/opt/  #网络传输,从主机到节点机中传输改配置后的hadoop 

节点机中的hadoop不作修改  (但是hosts文件映射需要改)

 

6. 修改hosts文件

[spark@HADOOP14 opt]$ vim  /ect/hosts

127.0.0.1       localhost.localdomain localhost4 localhost4.localdomain4

192.168.1.4     vm1 HADOOP14

192.168.1.6     vm3 HADOOP16

 


四、验证

1、格式化namenode:

  1. [spark@HADOOP14 opt]$ cd hadoop-2.6.0/
  2. [spark@HADOOP14 hadoop-2.6.0]$ ls
  3. bin  dfs  etc  include  input  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp
  4. [spark@HADOOP14 hadoop-2.6.0]$ ./bin/hdfs namenode -format
  5. [spark@HADOOP16 .ssh]$ cd ~/opt/hadoop-2.6.0
  6. [spark@HADOOP16 hadoop-2.6.0]$ ./bin/hdfs  namenode -format

 


2、启动hdfs:

  1. [spark@HADOOP14 hadoop-2.6.0]$ ./sbin/start-dfs.sh 
  2. 15/01/05 16:41:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Starting namenodes on [HADOOP14]
  4. HADOOP14: starting namenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-namenode-HADOOP14.out
  5. HADOOP16: starting datanode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-datanode-HADOOP16.out
  6. Starting secondary namenodes [HADOOP14]
  7. HADOOP14: starting secondarynamenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-secondarynamenode-HADOOP14.out
  8. 15/01/05 16:41:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

  1. [spark@HADOOP14 hadoop-2.6.0]$ jps
  2. 22230 Master
  3. 30889 Jps
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode
  7. 19781 ResourceManager

 


3、停止hdfs:

  1. [spark@HADOOP14 hadoop-2.6.0]$./sbin/stop-dfs.sh 
  2. 15/01/05 16:40:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Stopping namenodes on [HADOOP14]
  4. HADOOP14: stopping namenode
  5. HADOOP16: stopping datanode
  6. Stopping secondary namenodes [HADOOP14]
  7. HADOOP14: stopping secondarynamenode
  8. 15/01/05 16:40:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 

  1. [spark@HADOOP14 hadoop-2.6.0]$ jps
  2. 30336 Jps
  3. 22230 Master
  4. 22478 Worker
  5. 19781 ResourceManager

 


4、启动yarn:

  1. [spark@HADOOP14 hadoop-2.6.0]$./sbin/start-yarn.sh 
  2. starting yarn daemons
  3. starting resourcemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-resourcemanager-HADOOP14.out
  4. HADOOP16: starting nodemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-nodemanager-HADOOP16.out

 

  1. [spark@HADOOP14 hadoop-2.6.0]$ jps
  2. 31233 ResourceManager
  3. 22230 Master
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode
  7. 31503 Jps

 


5、停止yarn:

  1. [spark@HADOOP14 hadoop-2.6.0]$ ./sbin/stop-yarn.sh 
  2. stopping yarn daemons
  3. stopping resourcemanager
  4. HADOOP16: stopping nodemanager
  5. no proxyserver to stop

 

  1. [spark@HADOOP14 hadoop-2.6.0]$ jps
  2. 31167 Jps
  3. 22230 Master
  4. 22478 Worker
  5. 30498 NameNode
  6. 30733 SecondaryNameNode

 


6、查看集群状态:

  1. [spark@HADOOP14 hadoop-2.6.0]$ ./bin/hdfs dfsadmin -report
  2. 15/01/05 16:44:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Configured Capacity: 52101857280 (48.52 GB)
  4. Present Capacity: 45749510144 (42.61 GB)
  5. DFS Remaining: 45748686848 (42.61 GB)
  6. DFS Used: 823296 (804 KB)
  7. DFS Used%: 0.00%
  8. Under replicated blocks: 10
  9. Blocks with corrupt replicas: 0
  10. Missing blocks: 0

 

分类:

技术点:

相关文章: