【问题标题】:Assign column values to unique row in pandas dataframe [duplicate]将列值分配给熊猫数据框中的唯一行[重复]
【发布时间】:2020-04-13 11:05:00
【问题描述】:

我知道了。数据框:

AA          AB          AC                  AD              Col_1       Col_2     Col_3    
Northeast   Argentina   Northeast Argentina South America   Corrientes  Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco       Formosa   Santiago Del 

我想把它转换成:

AA          AB          AC                  AD              Col
Northeast   Argentina   Northeast Argentina South America   Corrientes
Northeast   Argentina   Northeast Argentina South America   Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco
Northern    Argentina   Northern  Argentina South America   Formosa
Northern    Argentina   Northern  Argentina South America   Santiago Del 

即我想保留前 4 列,但将剩余的每个列值分配到单独的行中。有没有办法在不使用 for 循环的情况下做到这一点?

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    你可以试试这个:

    df = df.melt(id_vars=['AA','AB','AC','AD']) 
    df.dropna(inplace=True)
    df.drop(columns='variable', inplace=True) 
    df = df.sort_values('AA').reset_index(drop=True)
    df.rename(columns={'value':'Col'}, inplace=True)
    
              AA         AB                   AC             AD           Col
    0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
    1  Northeast  Argentina  Northeast Argentina  South America      Misiones
    2   Northern  Argentina   Northern Argentina  South America         Chaco
    3   Northern  Argentina   Northern Argentina  South America       Formosa
    4   Northern  Argentina   Northern Argentina  South America  Santiago Del
    
    

    【讨论】:

    • 您可以通过不使用inplace=True 来改进答案 - 检查this
    • 不,我不是pandas dev,但更多时候看到这个意见并同意它。我认为这是更多的推荐,请查看this
    • 谢谢,这对我有帮助。感谢您抽出宝贵时间帮助我成为更好的熊猫专家。
    【解决方案2】:

    尝试使用:

    df['Col'] = df[['Col_1', 'Col_2', 'Col_3']].values.tolist()
    df = df.set_index(df.columns.drop('Col').tolist())['Col'].apply(pd.Series).stack().reset_index().rename(columns={0: 'Col'}).drop(['level_7', 'Col_1', 'Col_2', 'Col_3'], axis=1)
    print(df)
    

    输出:

              AA         AB                   AC             AD           Col
    0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
    1  Northeast  Argentina  Northeast Argentina  South America      Misiones
    2   Northern  Argentina   Northern Argentina  South America         Chaco
    3   Northern  Argentina   Northern Argentina  South America       Formosa
    4   Northern  Argentina   Northern Argentina  South America  Santiago Del
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-09-30
      • 1970-01-01
      • 2019-12-28
      • 2018-12-09
      • 1970-01-01
      • 2017-05-30
      • 2022-08-04
      • 1970-01-01
      相关资源
      最近更新 更多