【问题标题】:How to add multiple new columns with when condition in pyspark dataframe?如何在pyspark数据框中添加多个带有when条件的新列?
【发布时间】:2022-01-15 22:09:36
【问题描述】:
I need to add two new columns to my existing pyspark dataframe.
Below is my sample data:

Section   Grade     Promotion_grade Section_team
Admin       C       
Account     B       
IT          B   

condition :

If Section = Admin then Promotion_grade = B
If Section = Account then Promotion_grade = A
If Section = IT then
             If Grade = C then Promotion_grade = B & Section_team= team1
             If Grade = D  then Promotion_grade = C & Section_team= team2
             If Grade = A  then Promotion_grade = A+ & Section_team= team3

我可以为前两个条件添加一列。但我不知道其余的条件。

def addCols(data):
   data = (data.withColumn('Promotion_grade', F.when(data.Section  =='Admin', 'B')
                                                .when(data.Section  =='Account', 'A')
                                                .otherwise('Not applicable')))
   return data

请有人可以帮助我吗?可能是我正在做的方式是错误的。谢谢

【问题讨论】:

    标签: dataframe pyspark apache-spark-sql


    【解决方案1】:

    您可以嵌套when 条件来处理嵌套条件。

    工作示例

    from pyspark.sql import functions as F
    
    data = [("Admin", "C", ), 
            ("Account", "B", ), 
            ("IT", "B", ),
            ("IT", "C", ),
            ("IT", "D", ),
            ("IT", "A", ),]
    
    df = spark.createDataFrame(data, ("Section", "Grade", ))
    
    # Define Promotion Grade conditions for IT Section
    it_promotion_grade = (F.when(F.col("Grade") == "C", "B")
                           .when(F.col("Grade") == "D", "C")
                           .when(F.col("Grade") == "A", "A+")
                           .otherwise("Not applicable"))
    
    # Define Section Team conditions for IT Section
    it_section_team = (F.when(F.col("Grade") == "C", "team1")
                        .when(F.col("Grade") == "D", "team2")
                        .when(F.col("Grade") == "A", "team3")
                        .otherwise("Not applicable"))
    
    (df.withColumn("Promotion_grade", F.when(F.col("Section") == "Admin", "B")
                                      .when(F.col("Section") == "Account", "A")
                                      .when(F.col("Section") == "IT", it_promotion_grade)
                                      .otherwise("Not applicable"))
        .withColumn("Section_team", F.when(F.col("Section") == "IT", it_section_team)
                         .otherwise("Not applicable"))
        .show())
    

    输出

    +-------+-----+---------------+--------------+
    |Section|Grade|Promotion_grade|  Section_team|
    +-------+-----+---------------+--------------+
    |  Admin|    C|              B|Not applicable|
    |Account|    B|              A|Not applicable|
    |     IT|    B| Not applicable|Not applicable|
    |     IT|    C|              B|         team1|
    |     IT|    D|              C|         team2|
    |     IT|    A|             A+|         team3|
    +-------+-----+---------------+--------------+
    

    【讨论】:

    • 天啊,非常清楚!!非常感谢。
    猜你喜欢
    • 1970-01-01
    • 2021-07-09
    • 2017-06-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-02-22
    • 1970-01-01
    • 2019-08-29
    相关资源
    最近更新 更多