【问题标题】:adding dummy columns to the original dataframe将虚拟列添加到原始数据框
【发布时间】:2025-12-22 16:00:11
【问题描述】:

我有一个如下所示的数据框:

             JOINED_CO GENDER    EXEC_FULLNAME GVKEY YEAR CONAME BECAMECEO REJOIN  LEFTOFC    LEFTCO RELEFT   原因页
CO_PER_ROL
5622南雄爱尔拉A. Eichner 1004 1992 AAR CORP 19550101南19961001 199961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1993 AAR CORP 19550101南19961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1994 AAR CORP 19550101南19961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1995 AAR CORP 19550101南19961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1996 AAR CORP 19550101南19961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1997 AAR CORP 19550101南19961001 19990531南辞职79
5622南雄IRA A. Eichner 1004 1998 AAR CORP 19550101南19961001 19990531南辞职79
5623             NaN   男   David P. Storch  1004 1992 AAR CORP    19961009    NaN      NaN      NaN    NaN      NaN   57
5623              NaN   男性 David P. Storch   1004 1993 AAR CORP    19961009    NaN      NaN      NaN    NaN      NaN   57
5623             NaN   男性 David P. Storch   1004 1994 AAR CORP   19961009    NaN      NaN      NaN    NaN      NaN   57
5623             NaN   男   David P. Storch  1004 1995 AAR CORP    19961009    NaN      NaN      NaN    NaN      NaN   57
5623              NaN   男   David P. Storch  1004 1996 AAR CORP    19961009    NaN      NaN      NaN    NaN      NaN   57

对于 YEAR 值,我喜欢在原始数据框中添加年份列 (1993,1994...,2009),如果 YEAR 中的值为 1992,则 1992 列中的值应为 1,否则为 0。

我使用了一个非常愚蠢的 for 循环,但它似乎永远运行,因为我有一个大数据集。 谁能帮帮我,非常感谢!

【问题讨论】:

    标签: python pandas dataframe one-hot-encoding


    【解决方案1】:
    In [77]: df = pd.concat([df, pd.get_dummies(df['YEAR'])], axis=1); df
    Out[77]: 
          JOINED_CO GENDER    EXEC_FULLNAME  GVKEY  YEAR    CONAME  BECAMECEO  \
    5622        NaN   MALE   Ira A. Eichner   1004  1992  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1993  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1994  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1995  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1996  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1997  AAR CORP   19550101   
    5622        NaN   MALE   Ira A. Eichner   1004  1998  AAR CORP   19550101   
    5623        NaN   MALE  David P. Storch   1004  1992  AAR CORP   19961009   
    5623        NaN   MALE  David P. Storch   1004  1993  AAR CORP   19961009   
    5623        NaN   MALE  David P. Storch   1004  1994  AAR CORP   19961009   
    5623        NaN   MALE  David P. Storch   1004  1995  AAR CORP   19961009   
    5623        NaN   MALE  David P. Storch   1004  1996  AAR CORP   19961009   
    
          REJOIN   LEFTOFC    LEFTCO  RELEFT    REASON  PAGE  1992  1993  1994  \
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     1     0     0   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     1     0   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     1   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
    5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
    5623     NaN       NaN       NaN     NaN       NaN    57     1     0     0   
    5623     NaN       NaN       NaN     NaN       NaN    57     0     1     0   
    5623     NaN       NaN       NaN     NaN       NaN    57     0     0     1   
    5623     NaN       NaN       NaN     NaN       NaN    57     0     0     0   
    5623     NaN       NaN       NaN     NaN       NaN    57     0     0     0   
    
          1995  1996  1997  1998  
    5622     0     0     0     0  
    5622     0     0     0     0  
    5622     0     0     0     0  
    5622     1     0     0     0  
    5622     0     1     0     0  
    5622     0     0     1     0  
    5622     0     0     0     1  
    5623     0     0     0     0  
    5623     0     0     0     0  
    5623     0     0     0     0  
    5623     1     0     0     0  
    5623     0     1     0     0  
    

    如果您想删除 YEAR 列,则可以使用 del df['YEAR'] 跟进。或者,在调用concat 之前从df 中删除YEAR 列:

    df = pd.concat([df.drop('YEAR', axis=1), pd.get_dummies(df['YEAR'])], axis=1)
    

    【讨论】:

    • in [77] 是什么意思?
    • @guo: 那是IPython's 交互式shell提示符。它对输入进行编号。
    • 为什么我要用这个代码块将我的原始帧加倍?有什么猜测吗? @unutbu