【发布时间】:2018-06-25 10:50:58
【问题描述】:
我在使用 Spark 2.3 的新 pyspark.ml.image 功能时遇到问题。
在“本地计算”中使用ImageSchema.toNDArray() 时,可以。但是在rdd.map() 中使用它会引发错误,
AttributeError: 'NoneType' 对象没有属性 '_jvm'。
您可以在 pyspark 中尝试以下代码,并在“jpg”文件夹中准备好图片。比如我在里面放了this single picture。
在“本地计算”中是可以的:
>>> from pyspark.ml.image import ImageSchema
>>> df = ImageSchema.readImages("jpg")
>>> row = df.collect()[0] # collect() to a "local" list and take the first
>>> ImageSchema.toNDArray(row.image) # so this toNDArray() is a "local computation"
array([[[228, 141, 97],
[229, 142, 98],
[229, 142, 98],
...,
[239, 157, 110],
[239, 157, 110],
[239, 157, 109]],
...
...
[[ 66, 38, 21],
[ 66, 38, 21],
[ 66, 38, 21],
...,
[ 91, 55, 37],
[ 94, 57, 37],
[ 94, 57, 37]]], dtype=uint8)
但如果我把它放在rdd.map() 中,它会引发一个
AttributeError: 'NoneType' 对象没有属性 '_jvm'
>>> from pyspark.ml.image import ImageSchema
>>> df = ImageSchema.readImages("jpg")
>>> df.rdd.map(lambda row: ImageSchema.toNDArray(row.image)).take(1)
...
...
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/ml/image.py", line 123, in toNDArray
if any(not hasattr(image, f) for f in self.imageFields):
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/ml/image.py", line 90, in imageFields
if self._imageFields is None:
ctx = SparkContext._active_spark_context
self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields())
AttributeError: 'NoneType' object has no attribute '_jvm'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
...
...
这种情况已经过测试并可重现
Spark 2.3.0 provided by Cloudera parcel
Spark 2.3.0 on Hortonworks
Spark 2.3.0 on Windows with WinUtils
Spark 2.3.1 on Windows with WinUtils
出了什么问题?
我该如何解决?
【问题讨论】:
标签: pyspark apache-spark-mllib