TimerHotel

Hadoop集群安装部署

1.介绍

(1)架构模型

(2)使用工具

  1. VMWARE
  2. cenos7
  3. Xshell
  4. Xftp
  5. jdk-8u91-linux-x64.rpm
  6. hadoop-2.7.3.tar.gz

2.安装步骤

(1)部署master

  1. 创建一台虚拟机

  2. 修改ip

    这里请参考:VMWARE虚拟机中CentOs7网络连接

  3. Xftp传输jdk、hadhoop安装包

    把两个安装包拉取到/usr/local路径下

  4. 安装jdk

    rpm -ivh jdk-8u91-linux-x64.rpm

  5. 安装hadhoop

    tar zxvf hadoop-2.7.3.tar.gz

  6. 配置环境变量

    • 配置/hadoop/etc/hadoop/hadoop-env.sh的JAVA_HOME

      # Licensed to the Apache Software Foundation (ASF) under one
      # or more contributor license agreements.  See the NOTICE file
      # distributed with this work for additional information
      # regarding copyright ownership.  The ASF licenses this file
      # to you under the Apache License, Version 2.0 (the
      # "License"); you may not use this file except in compliance
      # with the License.  You may obtain a copy of the License at
      #
      #     http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      
      # Set Hadoop-specific environment variables here.
      
      # The only required environment variable is JAVA_HOME.  All others are
      # optional.  When running a distributed configuration it is best to
      # set JAVA_HOME in this file, so that it is correctly defined on
      # remote nodes.
      
      # The java implementation to use.
      export JAVA_HOME=/usr/java/default
      
      # The jsvc implementation to use. Jsvc is required to run secure datanodes
      # that bind to privileged ports to provide authentication of data transfer
      # protocol.  Jsvc is not required if SASL is configured for authentication of
      # data transfer protocol using non-privileged ports.
      #export JSVC_HOME=${JSVC_HOME}
      
      export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
      
      # Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
      for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
        if [ "$HADOOP_CLASSPATH" ]; then
          export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
        else
          export HADOOP_CLASSPATH=$f
        fi
      done
      
      # The maximum amount of heap to use, in MB. Default is 1000.
      #export HADOOP_HEAPSIZE=
      #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
      
      # Extra Java runtime options.  Empty by default.
      export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
      
      # Command specific options appended to HADOOP_OPTS when specified
      export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
      export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
      
      export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
      
      export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
      export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
      
      # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
      export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
      #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"
      
      # On secure datanodes, user to run the datanode as after dropping privileges.
      # This **MUST** be uncommented to enable secure HDFS if using privileged ports
      # to provide authentication of data transfer protocol.  This **MUST NOT** be
      # defined if SASL is configured for authentication of data transfer protocol
      # using non-privileged ports.
      export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
      
      # Where log files are stored.  $HADOOP_HOME/logs by default.
      #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
      
      # Where log files are stored in the secure data environment.
      export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
      
      ###
      # HDFS Mover specific parameters
      ###
      # Specify the JVM options to be used when starting the HDFS Mover.
      # These options will be appended to the options specified as HADOOP_OPTS
      # and therefore may override any similar flags set in HADOOP_OPTS
      #
      # export HADOOP_MOVER_OPTS=""
      
      ###
      # Advanced Users Only!
      ###
      
      # The directory where pid files are stored. /tmp by default.
      # NOTE: this should be set to a directory that can only be written to by 
      #       the user that will run the hadoop daemons.  Otherwise there is the
      #       potential for a symlink attack.
      export HADOOP_PID_DIR=${HADOOP_PID_DIR}
      export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
      
      # A string representing this instance of hadoop. $USER by default.
      export HADOOP_IDENT_STRING=$USER
      
    • 配置hadoop的全局变量

      1. 修改/etc/profile

        vi /etc/profile

        # /etc/profile
        
        # System wide environment and startup programs, for login setup
        # Functions and aliases go in /etc/bashrc
        
        # It's NOT a good idea to change this file unless you know what you
        # are doing. It's much better to create a custom.sh shell script in
        # /etc/profile.d/ to make custom changes to your environment, as this
        # will prevent the need for merging in future updates.
        
        pathmunge () {
            case ":${PATH}:" in
                *:"$1":*)
                    ;;
                *)
                    if [ "$2" = "after" ] ; then
                        PATH=$PATH:$1
                    else
                        PATH=$1:$PATH
                    fi
            esac
        }
        
        
        if [ -x /usr/bin/id ]; then
            if [ -z "$EUID" ]; then
                # ksh workaround
                EUID=`/usr/bin/id -u`
                UID=`/usr/bin/id -ru`
            fi
            USER="`/usr/bin/id -un`"
            LOGNAME=$USER
            MAIL="/var/spool/mail/$USER"
        fi
        
        # Path manipulation
        if [ "$EUID" = "0" ]; then
            pathmunge /usr/sbin
            pathmunge /usr/local/sbin
        else
            pathmunge /usr/local/sbin after
            pathmunge /usr/sbin after
        fi
        
        HOSTNAME=`/usr/bin/hostname 2>/dev/null`
        HISTSIZE=1000
        if [ "$HISTCONTROL" = "ignorespace" ] ; then
            export HISTCONTROL=ignoreboth
        else
            export HISTCONTROL=ignoredups
        fi
        
        export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
        
        # By default, we want umask to get set. This sets it for login shell
        # Current threshold for system reserved uid/gids is 200
        # You could check uidgid reservation validity in
        # /usr/share/doc/setup-*/uidgid file
        if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
            umask 002
        else
            umask 022
        fi
        
        for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
            if [ -r "$i" ]; then
                if [ "${-#*i}" != "$-" ]; then 
                    . "$i"
                else
                    . "$i" >/dev/null
                fi
            fi
        done
        
        unset i
        unset -f pathmunge
        
        export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
        
      2. 生效/etc/profile

        source /etc/profile

(2)部署slave

  1. 克隆三个主机slave1、slave2、slave3
  2. 修改ip

(3)统一配置

利用Xshell提供的工具多窗口命令行,会使我们的操作更简单。

  1. 测试网络

    ping 192.168.40.100

    ping 192.168.40.101

    ping 192.168.40.102

    ping 192.168.40.103

  2. 关闭防火墙

    systemctl stop filewalld ------关闭防火墙

    systemctl disable filewalld ------失效防火墙,下次重启也属于关闭状态

  3. 修改host

    vi /etc/hosts

    192.168.40.100 master

    192.168.40.101 slave1

    192.168.40.102 slave2

    192.168.40.103 slave3

  4. 配置core-site.xml

(4)启动master

切换到master主机:

  1. 格式化namenode

    hdfs namenode -format

  2. 启动namenode

    hadoop-daemon.sh start namenode

  3. 查看namenode是否启动成功

    jps

    如果有NameNode进程则启动成功。

(5)启动slave

切换到slave1,slave2,slave3主机:

  1. 启动datanode

    hadoop-daemon.sh start datanode

  2. 查看datanode是否启动成功

    jps

    如果有DataNode进程则启动成功。

(6)查看NameNode里的DataNode

hadoop dfsadmin -report

结果:

相关文章: