【问题标题】:Splitting Column Headers and Duplicating Row Values in Pandas Dataframe在 Pandas 数据框中拆分列标题和复制行值
【发布时间】:2018-08-17 12:59:03
【问题描述】:

在下面的示例 df 中,我试图找到一种基于 ';' 拆分列标题('1;2'、'4'、'5;6')的方法存在并复制这些拆分列中的行值。 (我的实际 df 来自导入的 csv 文件,所以通常我有大约 50-80 个需要拆分的列标题)

下面是我的输出代码

 import pandas as pd
 import numpy as np  
 #

 data = np.array([['Market','Product Code','1;2','4','5;6'],
            ['Total Customers',123,1,500,400],
            ['Total Customers',123,2,400,320],
            ['Major Customer 1',123,1,100,220],
            ['Major Customer 1',123,2,230,230],
            ['Major Customer 2',123,1,130,30],
            ['Major Customer 2',123,2,20,10],
            ['Total Customers',456,1,500,400],
            ['Total Customers',456,2,400,320],
            ['Major Customer 1',456,1,100,220],
            ['Major Customer 1',456,2,230,230],
            ['Major Customer 2',456,1,130,30],
            ['Major Customer 2',456,2,20,10]])

  df =pd.DataFrame(data)
  df.columns = df.iloc[0]
  df = df.reindex(df.index.drop(0))
  print (df)
0             Market Product Code 1;2    4  5;6
1    Total Customers          123   1  500  400
2    Total Customers          123   2  400  320
3   Major Customer 1          123   1  100  220
4   Major Customer 1          123   2  230  230
5   Major Customer 2          123   1  130   30
6   Major Customer 2          123   2   20   10
7    Total Customers          456   1  500  400
8    Total Customers          456   2  400  320
9   Major Customer 1          456   1  100  220
10  Major Customer 1          456   2  230  230
11  Major Customer 2          456   1  130   30
12  Major Customer 2          456   2   20   10

下面是我想要的输出

 0             Market Product Code   1   2      4      5    6
 1    Total Customers          123   1   1     500    400  400
 2    Total Customers          123   2   2     400    320  320
 3   Major Customer 1          123   1   1     100    220  220
 4   Major Customer 1          123   2   2     230    230  230
 5   Major Customer 2          123   1   1     130    30   30
 6   Major Customer 2          123   2   2     20     10   10
 7    Total Customers          456   1   1     500    400  400
 8    Total Customers          456   2   2     400    320  320
 9   Major Customer 1          456   1   1     100    220  220
10  Major Customer 1           456   2   2     230    230  230
11  Major Customer 2           456   1   1     130    30   30
12  Major Customer 2           456   2   2     20     10   10

理想情况下,我想在“read_csv”级别执行这样的任务。有什么想法吗?

【问题讨论】:

    标签: pandas split co


    【解决方案1】:

    试试reindexrepeat

    s=df.columns.str.split(';')
    df=df.reindex(columns=df.columns.repeat(s.str.len()))
    df.columns=sum(s.tolist(),[])
    df
    Out[247]: 
                  Market Product Code  1  2    4    5    6
    1    Total Customers          123  1  1  500  400  400
    2    Total Customers          123  2  2  400  320  320
    3   Major Customer 1          123  1  1  100  220  220
    4   Major Customer 1          123  2  2  230  230  230
    5   Major Customer 2          123  1  1  130   30   30
    6   Major Customer 2          123  2  2   20   10   10
    7    Total Customers          456  1  1  500  400  400
    8    Total Customers          456  2  2  400  320  320
    9   Major Customer 1          456  1  1  100  220  220
    10  Major Customer 1          456  2  2  230  230  230
    11  Major Customer 2          456  1  1  130   30   30
    12  Major Customer 2          456  2  2   20   10   10
    

    【讨论】:

    • 完美运行@Wen。谢谢!
    • @jwlon81 yw~ :-) 快乐编码
    【解决方案2】:

    您可以使用 ';' 拆分列然后重建一个df:

    pd.DataFrame({c:df[t] for t in df.columns for c in t.split(';')})
    Out[157]: 
        1  2    4    5    6            Market Product Code
    1   1  1  500  400  400   Total Customers          123
    2   2  2  400  320  320   Total Customers          123
    3   1  1  100  220  220  Major Customer 1          123
    4   2  2  230  230  230  Major Customer 1          123
    5   1  1  130   30   30  Major Customer 2          123
    6   2  2   20   10   10  Major Customer 2          123
    7   1  1  500  400  400   Total Customers          456
    8   2  2  400  320  320   Total Customers          456
    9   1  1  100  220  220  Major Customer 1          456
    10  2  2  230  230  230  Major Customer 1          456
    11  1  1  130   30   30  Major Customer 2          456
    12  2  2   20   10   10  Major Customer 2          456
    

    或者如果您想保留列顺序:

    pd.concat([df[t].to_frame(c) for t in df.columns for c in t.split(';')],1)
    Out[167]: 
                  Market Product Code  1  2    4    5    6
    1    Total Customers          123  1  1  500  400  400
    2    Total Customers          123  2  2  400  320  320
    3   Major Customer 1          123  1  1  100  220  220
    4   Major Customer 1          123  2  2  230  230  230
    5   Major Customer 2          123  1  1  130   30   30
    6   Major Customer 2          123  2  2   20   10   10
    7    Total Customers          456  1  1  500  400  400
    8    Total Customers          456  2  2  400  320  320
    9   Major Customer 1          456  1  1  100  220  220
    10  Major Customer 1          456  2  2  230  230  230
    11  Major Customer 2          456  1  1  130   30   30
    12  Major Customer 2          456  2  2   20   10   10
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-07-25
      • 2019-05-20
      • 1970-01-01
      • 2020-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多