【发布时间】:2022-01-15 22:09:36
【问题描述】:
I need to add two new columns to my existing pyspark dataframe.
Below is my sample data:
Section Grade Promotion_grade Section_team
Admin C
Account B
IT B
condition :
If Section = Admin then Promotion_grade = B
If Section = Account then Promotion_grade = A
If Section = IT then
If Grade = C then Promotion_grade = B & Section_team= team1
If Grade = D then Promotion_grade = C & Section_team= team2
If Grade = A then Promotion_grade = A+ & Section_team= team3
我可以为前两个条件添加一列。但我不知道其余的条件。
def addCols(data):
data = (data.withColumn('Promotion_grade', F.when(data.Section =='Admin', 'B')
.when(data.Section =='Account', 'A')
.otherwise('Not applicable')))
return data
请有人可以帮助我吗?可能是我正在做的方式是错误的。谢谢
【问题讨论】:
标签: dataframe pyspark apache-spark-sql