【发布时间】:2016-11-16 12:29:53
【问题描述】:
为提交作业而进行的 API 调用。响应状态 - 它正在运行
在集群 UI 上 -
工人(奴隶) - worker-20160712083825-172.31.17.189-59433 还活着
已使用 2 个核心中的 1 个
已使用 6 个内存中的 1Gb
正在运行的应用程序
app-20160713130056-0020 - 等待 5 小时后
核心 - 无限
应聘职位描述
活跃阶段
reduceByKey at /root/wordcount.py:23
待定阶段
takeOrdered at /root/wordcount.py:26
正在运行的驱动程序 -
stderr log page for driver-20160713130051-0025
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
根据Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 从站尚未启动 - 因此它没有资源。
但是在我的情况下 - 从站 1 正在工作
根据Unable to Execute More than a spark Job "Initial job has not accepted any resources" 我正在使用部署模式 = 集群(不是客户端)因为我有 1 个主 1 个从属,并且通过 Postman / 任何地方调用提交 API
集群还有可用的核心、RAM、内存 - 仍然作业抛出错误 由 UI 传达的
根据TaskSchedulerImpl: Initial job has not accepted any resources; 我分配了
~/spark-1.5.0/conf/spark-env.sh
Spark 环境变量
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2
在奴隶中复制那些
sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh
上述问题答案中的所有案例 - 均适用,但仍未找到解决方案。因此,因为我正在使用 API 和 Apache SPark - 也许需要一些其他帮助。
2016 年 7 月 18 日编辑
Wordcount.py - 我的 PySpark 应用程序代码 -
from pyspark import SparkContext, SparkConf
logFile = "/user/root/In/a.txt"
conf = (SparkConf().set("num-executors", "1"))
sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")
错误
Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
根据Spark UI showing 0 cores even when setting cores in App,
Spark WebUI 声明使用的内核为零,并且无限期等待没有任务运行。该应用程序在运行时或内核期间也没有使用任何内存,并在启动时立即进入等待状态
Spark 版本 1.6.1 Ubuntu 亚马逊EC2
【问题讨论】:
-
尝试运行另一个代码 - 简单的 python 应用程序 - 错误仍然存在
from pyspark import SparkContext, SparkConflogFile = "/user/root/In/a.txt"conf = (SparkConf().set("num-executors", "1"))sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)textFile = sc.textFile(logFile)wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)wordCounts.saveAsTextFile("/user/root/In/output.txt") -
你能用 spark submit 运行它吗?
-
尝试在 spark-submit 或 API 中减少每个节点的内存设置
-
看不到 mto 就 API 调用而言找到设置
-
Master的环境变量设置为/root/spark/conf/spark-env.conf - export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY=1000 export HADOOP_HOME="/root/ephemeral- hdfs" export SPARK_MASTER_IP=ec2-wxyz.compute-1.amazonaws.com export MASTER=
cat /root/spark-ec2/cluster-url
标签: api apache-spark amazon-ec2