【问题标题】:pyspark flatmat error: TypeError: 'int' object is not iterablepyspark flatmat 错误:TypeError:'int' 对象不可迭代
【发布时间】:2018-08-17 18:00:03
【问题描述】:

这是我书中的示例代码:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("spark://chetan-ThinkPad- 
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)

numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
    print(values)

我收到此错误: TypeError: 'int' 对象不可迭代

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more

【问题讨论】:

    标签: apache-spark pyspark python-3.5 flatmap


    【解决方案1】:

    您不能在 Int 对象上使用 flatMap

    flatMap 可用于集合对象,例如Arrayslist

    您可以在 rdd 类型上使用映射函数 RDD[Integer]

    numbersRDD = sc.parallelize([1, 2, 3, 4])
    actionRDD = numbersRDD.map(lambda x: x + x)
    
    def printing(x):
        print x
    
    actionRDD.foreach(printing)
    

    应该打印的

    2
    4
    6
    8
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-11-28
      • 2017-09-11
      • 1970-01-01
      • 2023-01-22
      • 2019-02-11
      • 2018-09-29
      相关资源
      最近更新 更多