【问题标题】:DataFrame transformation in Python PandasPython Pandas 中的 DataFrame 转换
【发布时间】:2016-08-11 22:46:34
【问题描述】:

我正在尝试将 Pandas DataFrame 转换为一个新的 DataFrame,其中某个列中的每个项目都给定了自己的行。例如:

之前:

   ID             Name        Date   Location
0   0       John, Dave  01/01/1992     Mexico
1   1              Tim  07/07/1997  Australia
2   2       Mike, John  12/24/2012     Zambia
3   3  Bob, Rick, Tony  05/17/2007       Cuba
4   4            Roger  04/05/2000    Iceland
5   5           Carlos  05/24/1995       Guam

当前解决方案:

new_df = pd.DataFrame(columns = df.columns)
for index,row in df.iterrows():
    new_row = pd.DataFrame(df.loc[index]).transpose()
    target_info = df.loc[index,'Name']
    if (len(target_info.split(',')) > 1):
        for item in target_info.split(','):
            new_row.loc[index,'Name'] = item
           new_df = new_df.append(new_row)
    else:
        new_df = new_df.append(new_row)  

之后:

  ID    Name        Date   Location
0  0    John  01/01/1992     Mexico
0  0    Dave  01/01/1992     Mexico
1  1     Tim  07/07/1997  Australia
2  2    Mike  12/24/2012     Zambia
2  2    John  12/24/2012     Zambia
3  3     Bob  05/17/2007       Cuba
3  3    Rick  05/17/2007       Cuba
3  3    Tony  05/17/2007       Cuba
4  4   Roger  04/05/2000    Iceland
5  5  Carlos  05/24/1995       Guam

肯定有更优雅的东西吗?

【问题讨论】:

    标签: python pandas dataframe transform


    【解决方案1】:

    你可以这样做:

    nm = df.Name.str.split(',\s*', expand=True)
    cols=list(set(df.columns) - set(['Name']))
    
    pd.melt(df[cols].join(nm),
            id_vars=cols,
            value_vars=nm.columns.tolist(),
            value_name='Name') \
      .dropna() \
      .drop(['variable'], axis=1) \
      .sort_values('ID')
    

    一步一步:

    In [128]: nm = df.Name.str.split(',\s*', expand=True)
    
    In [129]: nm
    Out[129]:
            0     1     2
    0    John  Dave  None
    1     Tim  None  None
    2    Mike  John  None
    3     Bob  Rick  Tony
    4   Roger  None  None
    5  Carlos  None  None
    
    In [130]: cols=list(set(df.columns) - set(['Name']))
    
    In [131]: cols
    Out[131]: ['Date', 'ID', 'Location']
    
    In [133]: pd.melt(df[cols].join(nm),
       .....:         id_vars=cols,
       .....:         value_vars=nm.columns.tolist(),
       .....:         value_name='Name') \
       .....:   .dropna() \
       .....:   .drop(['variable'], axis=1) \
       .....:   .sort_values('ID')
    Out[133]:
              Date  ID   Location    Name
    0   01/01/1992   0     Mexico    John
    6   01/01/1992   0     Mexico    Dave
    1   07/07/1997   1  Australia     Tim
    2   12/24/2012   2     Zambia    Mike
    8   12/24/2012   2     Zambia    John
    3   05/17/2007   3       Cuba     Bob
    9   05/17/2007   3       Cuba    Rick
    15  05/17/2007   3       Cuba    Tony
    4   04/05/2000   4    Iceland   Roger
    5   05/24/1995   5       Guam  Carlos
    

    【讨论】:

      【解决方案2】:

      您可以将拆分名称作为一个系列获取,删除现有的名称列,然后加入拆分名称。

      # Split the 'Name' column as a Series, setting the appropriate name and index.
      split_names = df['Name'].str.split(', ', expand=True).stack()
      split_names.name = 'Name'
      split_names.index = split_names.index.get_level_values(0)
      
      # Drop the existing 'Name' column, and join the split names.
      df.drop('Name', axis=1, inplace=True)
      df = df.join(split_names)
      

      生成的输出与您的示例中的相同,但名称列在最后。如果您想要原始顺序,可以对列重新排序。

         ID        Date   Location    Name
      0   0  01/01/1992     Mexico    John
      0   0  01/01/1992     Mexico    Dave
      1   1  07/07/1997  Australia     Tim
      2   2  12/24/2012     Zambia    Mike
      2   2  12/24/2012     Zambia    John
      3   3  05/17/2007       Cuba     Bob
      3   3  05/17/2007       Cuba    Rick
      3   3  05/17/2007       Cuba    Tony
      4   4  04/05/2000    Iceland   Roger
      5   5  05/24/1995       Guam  Carlos
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2022-01-22
        • 2021-11-06
        • 2017-03-17
        • 2017-03-23
        • 2018-03-11
        • 1970-01-01
        • 2017-09-01
        • 2021-06-14
        相关资源
        最近更新 更多