【发布时间】:2020-05-27 12:03:34
【问题描述】:
from pyspark.sql.functions import when,col
from pyspark.sql.functions import udf
#Your code here to create a new variable df_kmeans_new with a new column Position_Group,..
from pyspark.sql.types import *
#Your code to complete
DEF= ["LB","LWB","RB","LCB","RCB","CB","RWB"]
FWD= ["RF","LF","LW","RS","RW","LS","CF","ST"]
MID= ["LCM","LM","RDM","CAM","RAM","RCM","CM","CDM","RM","LAM","LDM"]
df = spark.createDataFrame(
[(1, "LB", "4"),
(2, "LM", "0"),
(3, "LCB", "4"),
(4, "RS", "4")],
("id", "Position", "Position_x"))
def check_in_def(cell_val):
if cell_val in DEF:
return "DEF"
elif cell_val in FWD:
return "FWD"
elif cell_val in MID:
return "MID"
else:
return "NA"
df = df.withColumn("Position_Group",when(check_in_def(df.Position)=="DEF","DEF").when(check_in_def(df.Position)=="FWD","FWD").otherwise(0)).show()
如果在特定数组中找到 Position col 值,我想在 df 中创建一个新的 col,它将包含 3 个数组名称 DEF、FWD 和 MID 之一。
但代码不起作用..请有人帮忙!
【问题讨论】:
标签: python apache-spark pyspark data-science