换了新笔记本,做个笔记。谁让今天是什么Sa~x不放假的程序员节呢。emm..
一、软件准备(自取所需)
二、SSH免密码登录
(即使是单机local to local 也需要ssh,否则格式化hadoop存储系统时无权限,导致失败
:localhost: @localhost: Permission denied (publickey,password). Starting)
public-key生成命令(在客户端下依次执行,所有选项按回车即可)
(1)$ ssh-****** -t dsa -f ~/.ssh/id_dsa
(2)$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
[注释:~/.ssh/id_dsa.pub文件为公钥,拷贝到Server的~/.ssh/目录中,执行cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys,权限只给用户本人,否则无法连接]
尝试连接(第一次可能需要输入密码):
:~$ ssh localhost
三、安装Java和Scala
1.分别解压Java和Scala到自己想存放的目录
2.配置环境变量
[email protected]:~$ gedit .bashrc (在末尾加入)
## java
export JAVA_HOME=/home/raini/app/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$JRE_HOME/bin:$PATH
## scala
export SCALA_HOME=/home/raini/app/scala
export PATH=${SCALA_HOME}/bin:$PATH
3. 执行 $ source .bashrc (应用更改)
4. 验证
四、安装Hadoop
1. 解压:tar -zxvf hadoop-3.1.1.tar.gz
2. [email protected]:~$ gedit .bashrc(在文件里追加)
## hadoop-3.x
export HADOOP_HOME=/home/raini/app/hadoop
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
#
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
#
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
#
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#
#export HDFS_DATANODE_USER=root
#export HDFS_DATANODE_SECURE_USER=root
#export HDFS_SECONDARYNAMENODE_USER=root
#export HDFS_NAMENODE_USER=root
3. vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/raini/app/hadoop/tmp/tmp</value>
</property>
</configuration>
4. vi etc/hadoop/hdfs-site.xml
<!-- 配置副本个数以及数据存放的路径 -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/raini/app/hadoop/tmp/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/home/raini/app/hadoop/tmp/hdfs/data</value>
</property>
</configuration>
5. vi hadoop-env.sh
export JAVA_HOME=/home/raini/app/jdk
export HADOOP_HOME=/home/raini/app/hadoop
6.格式化hdfs文件系统
$ bin/hdfs namenode -format
7.启动namenode和datanode守护进程
$ sbin/start-dfs.sh
8.jps查看进程
9.访问namenode的web服务
http://localhost:9870/ ,查看hadoop状况
五、安装Spark
1. 解压:tar -zxvf spark-2.3.2-bin-hadoop2.7.tgz
2. [email protected]:~$ gedit .bashrc(在文件里追加)
顺便把pyspark也配置了
## spark
export SPARK_HOME=/home/raini/app/spark
export PYTHONPATH=${SPARK_HOME}/bin:$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PATH
3.更改配置
1. cp slaves.template slaves
2. vim slaves (追加自己的主机名)
##localhost
biyuzhe
3.cp spark-env.sh.template spark-env.sh
4.vim spark-env.sh
追加:
export JAVA_HOME=/home/raini/app/jdk
export SCALA_HOME=/home/raini/app/scala
export SPARK_WORKER_MEMORY=1G
export HADOOP_HOME=/home/raini/app/hadoop
export HADOOP_CONF_DIR=/home/raini/app/hadoop/etc/hadoop
export SPARK_MASTER_HOST=biyuzhe
5.vim spark-defaults.conf
spark.master spark://biyuzhe:7077
spark.eventLog.enabled true
spark.local.dir /home/raini/app/spark/data/spark_shuffle
spark.eventLog.dir hdfs://biyuzhe:8021/eventLog
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 1g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
4.启动spark
$SPARK_HOME/sbin/start-all.sh
5.web监控
http://biyuzhe:8080/
(完)