【Spark】Spark-shell案例——standAlone模式下读取HDFS上存放的文件

目录

可以先用local模式读取一下

步骤

一、先将做测试的数据上传到HDFS
二、开发scala代码

standAlone模式查看HDFS上的文件

步骤

一、退出local模式，重新进入Spark-shell
二、开发scala代码

可以先用local模式读取一下

步骤

一、先将做测试的数据上传到HDFS

cd /export/servers/sparkdatas
hdfs dfs -mkdir -p /sparkwordcount
hdfs dfs -put wordcount.txt  /sparkwordcount

二、开发scala代码

sc.textFile("hdfs://node01:8020/sparkwordcount/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).collect

在这里插入图片描述
如果不需要查看结果，而是需要将结果存储为文本文件，则将.collect换成.saveAsTestFile(要存放的Path)即可

standAlone模式查看HDFS上的文件

步骤

一、退出local模式，重新进入Spark-shell

bin/spark-shell --master spark://node01:7077 \
 --executor-memory 1g \
 --total-executor-cores 2

二、开发scala代码

sc.textFile("hdfs://node01:8020/sparkwordcount/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).collect

在这里插入图片描述

相关文章：

2022-02-19
2022-12-23
2022-12-23
2021-04-14
2021-11-18
2021-09-22
2021-12-23

猜你喜欢

2021-11-18
2021-11-18
2022-12-23
2021-11-18
2021-12-18
2022-12-23
2022-12-23

相关资源

下载 2023-02-03
下载 2022-12-02
下载 2023-01-01

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode