【发布时间】:2021-01-09 18:18:07
【问题描述】:
我正在研究 PySpark Python,我已经提到了代码并遇到了一些问题,我想知道是否有人知道以下问题?
我的样本数据集
data = [('12er',None), ('4xcf',None), ('6hjk',None), \
('45fh',2000),('56gh',2000),('hj45',None)]
columns= ["Id","Amount"]
pivotDF= spark.createDataFrame(data = data, schema = columns)
pivotDF.show(truncate=False)
原始数据集的片段 我正在使用 fill 将 null 替换为零
pivotDF.na.fill(0).show(n=2)
虽然我可以在示例数据集中但在我的 pspark 数据框中执行此操作 我收到此错误
Fail to execute line 1: pivotDF.na.fill(0).show(n=2)
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o244.fill.
: org.apache.spark.sql.AnalysisException: Reference 'Cancel' is ambiguous, could be: Cancel, Cancel.;
at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:259)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:126)
at org.apache.spark.sql.Dataset.resolve(Dataset.scala:222)
at org.apache.spark.sql.Dataset.col(Dataset.scala:1269)
at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443)
at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:502)
at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492)
at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:179)
at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:163)
at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-3253703896.py", line 473, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2241, in fill
return self.df.fillna(value=value, subset=subset)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1665, in fillna
return DataFrame(self._jdf.na().fill(value), self.sql_ctx)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Reference 'Cancel' is ambiguous, could be: Cancel, Cancel.;"
【问题讨论】:
-
您正在使用
data.na.fill().show()吗?据我所知,data变量是 pythonlist。 -
抱歉错字,它是一个数据框,在底部添加了我的架构
-
所以从表面上看,
na.fill不支持None值,请参阅stackoverflow.com/questions/50992713/… 以了解如何解决此问题。 -
它不工作
标签: python-3.x pyspark