为什么我得到这个

【问题标题】：Why i am getting this为什么我得到这个
【发布时间】：2023-01-12 19:13:07
【问题描述】：

df = spark.createDataFrame([(1, "a", None), (2, None, 3.0), (3, "c", 4.0),(None, "c", None)], ["id", "x", "y"])enter image description here

df.na.drop(subset=["x"]).show() enter image description here
df.na.drop(subset=["y"]).show() enter image description here
当我删除 X 为什么它将其他行值显示为 null 并且在 Null 位置它生成了 C

【问题讨论】：

大多数图片（全部？）仅包含文字。为什么不复制/粘贴文本，而不使用图像？
欢迎来到 SO。请阅读How do I ask a good question?和How to create a Minimal, Reproducible Example。

标签： pyspark

【解决方案1】：

这是您的数据框：

+----+----+----+
|  id|   x|   y|
+----+----+----+
|   1|   a|null|  <== Line with y null
|   2|null| 3.0|  <== Line with x null
|   3|   c| 4.0|
|null|   c|null|  <== Line with y null
+----+----+----+

和你的代码的输出：

df.na.drop(subset=["x"]).show()

+----+---+----+
|  id|  x|   y|
+----+---+----+
|   1|  a|null|
|   3|  c| 4.0|
|null|  c|null|
+----+---+----+
# Line with id=2 is dropped

和

df.na.drop(subset=["y"]).show()

+---+----+---+
| id|   x|  y|
+---+----+---+
|  2|null|3.0|
|  3|   c|4.0|
+---+----+---+
# Lines with id=1 and id=null are dropped

没有错，没有生成。它做你要求它做的事。

【讨论】：