【问题标题】:pandas - pivot_table while preserving order failspandas - 保留顺序时的数据透视表失败
【发布时间】:2018-04-01 14:01:13
【问题描述】:

我有以下数据框,其中周不是 ISO 周而是财政周(1 是 7 月的第一周,52 是 6 月的最后一周):

> df
     domain  week  count
0        A    43      5
1        A    45      1
2        A    50      1
3        A    51      4
4        A     1      3
5        A     3     12
6        B    43      1
7        B    44      1
8        B    45      4
9        B    50     11
10       B     2      3
11       B     3     12
12       C    51      6
13       C     1     14
14       C     5      1

我想在保留星期顺序的同时旋转该表,以获得一个新的数据框,该数据框如下所示,其中值为计数,列为域:

> new_df
week   A      B     C
43      5     1   NaN
44    NaN     1   NaN
45      1     4   NaN      
50      1    11   NaN
51      4   NaN     6
1       3   NaN    14
2     NaN     3   NaN
3      12    12   NaN
5     NaN   NaN     1

我尝试使用 groupie 并解开,但收到此错误:

> df = df.groupby(['week'], sort=False)['count'].unstack('domain')
AttributeError: Cannot access callable attribute 'unstack' of 'SeriesGroupBy' objects, try using the 'apply' method

【问题讨论】:

    标签: python-2.7 pandas pivot-table pandas-groupby


    【解决方案1】:

    选项 1] 您可以使用自定义排序的weeks 索引助手和.loc

    In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))
    
    In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()
    
    In [4820]: dfp.loc[weeks & dfp.index]
    Out[4820]:
    domain     A     B     C
    43       5.0   1.0   NaN
    44       NaN   1.0   NaN
    45       1.0   4.0   NaN
    50       1.0  11.0   NaN
    51       4.0   NaN   6.0
    1        3.0   NaN  14.0
    2        NaN   3.0   NaN
    3       12.0  12.0   NaN
    5        NaN   NaN   1.0
    

    选项 2] 或者,使用 pivot

    In [4821]: dfp = df.pivot('week', 'domain', 'count')
    
    In [4822]: dfp.loc[weeks & dfp.index]
    Out[4822]:
    domain     A     B     C
    43       5.0   1.0   NaN
    44       NaN   1.0   NaN
    45       1.0   4.0   NaN
    50       1.0  11.0   NaN
    51       4.0   NaN   6.0
    1        3.0   NaN  14.0
    2        NaN   3.0   NaN
    3       12.0  12.0   NaN
    5        NaN   NaN   1.0
    

    选项 3] 或者,reindex 而不是 .loc

    In [4830]: dfp.reindex(weeks & dfp.index)
    Out[4830]:
    domain     A     B     C
    43       5.0   1.0   NaN
    44       NaN   1.0   NaN
    45       1.0   4.0   NaN
    50       1.0  11.0   NaN
    51       4.0   NaN   6.0
    1        3.0   NaN  14.0
    2        NaN   3.0   NaN
    3       12.0  12.0   NaN
    5        NaN   NaN   1.0
    

    详情

    In [4826]: weeks
    Out[4826]:
    Int64Index([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
                43, 44, 45, 46, 47, 48, 49, 50, 51,  0,  1,  2,  3,  4,  5,  6,  7,
                 8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
                25],
               dtype='int64')
    
    In [4827]: weeks & dfp.index
    Out[4827]: Int64Index([43, 44, 45, 50, 51, 1, 2, 3, 5], dtype='int64')
    

    【讨论】:

      【解决方案2】:

      你需要weeks的自定义顺序,所以需要ordered categorical自定义顺序并省略sort=False

      cats = list(range(26, 52)) + list(range(26))
      print (cats)
      [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
       47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
       16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
      
      df['week'] = df['week'].astype('category', ordered=True, categories=cats)
      
      df = df.groupby(['week','domain'])['count'].sum().unstack()
      print (df)
      domain     A     B     C
      week                    
      43       5.0   1.0   NaN
      44       NaN   1.0   NaN
      45       1.0   4.0   NaN
      50       1.0  11.0   NaN
      51       4.0   NaN   6.0
      1        3.0   NaN  14.0
      2        NaN   3.0   NaN
      3       12.0  12.0   NaN
      5        NaN   NaN   1.0
      

      【讨论】:

      • 问题是第44周和第2周放错了位置。第 44 周应该在 43 到 45 之间,第 2 周应该在 1 到 3 之间。
      • 嗯,所以排序是 [26,27...,51,0,1,..,25] ?
      猜你喜欢
      • 2021-12-14
      • 1970-01-01
      • 2012-05-22
      • 2016-09-07
      • 2019-03-16
      • 2019-08-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多