【发布时间】:2021-02-10 01:19:24
【问题描述】:
我正在尝试使用以下代码将 RDD 转换回 Spark DataFrame
schema = StructType(
[StructField("msn", StringType(), True),
StructField("Input_Tensor", ArrayType(DoubleType()), True)]
)
DF = spark.createDataFrame(rdd, schema=schema)
数据集只有两列:
-
msn只包含一个字符串。 -
Input_Tensor一个二维浮点数组。
但我一直有这个错误,我不确定它来自哪里:
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/myproject/datasets/train.py", line 51, in EMA_detector
DF = spark.createDataFrame(rdd, schema=schema)
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/sql/session.py", line 790, in createDataFrame
jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/rdd.py", line 2364, in _to_java_object_rdd
return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/rdd.py", line 2599, in _jrdd
self._jrdd_deserializer, profiler)
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/rdd.py", line 2500, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/rdd.py", line 2486, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
File "/tmp/conda-d3f87356-6008-4349-9075-f488e0870d02/real/envs/conda-env/lib/python3.6/site-packages/pyspark/serializers.py", line 694, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: AttributeError: 'NoneType' object has no attribute 'items'
编辑:
我的 RDD 来自这个:
rdd = test_data.mapPartitions(lambda part: vectorizer.transform(part))
数据集 test_data 本身就是一个 RDD,但不知何故,在 mapPartitions 之后它是一个 pipelinedRDD,这似乎就是它失败的原因。
【问题讨论】:
标签: apache-spark pyspark