【问题标题】：Pandas pivot_table preserve orderPandas pivot_table 保留顺序
【发布时间】：2017-07-08 16:14:16
【问题描述】：

>>> df
   A   B   C      D
0  foo one small  1
1  foo one large  2
2  foo one large  2
3  foo two small  3
4  foo two small  3
5  bar one large  4
6  bar one small  5
7  bar two small  6
8  bar two large  7
>>> table = pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
          small  large
foo  one  1      4
     two  6      NaN
bar  one  5      4
     two  6      7

我希望输出如上所示，但我得到了一个排序的输出。 bar 高于 foo 等等。

【问题讨论】：

标签： python pandas dataframe pivot-table

【解决方案1】：

我认为 pivot_table 没有排序选项，但 groupby 有：

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().unstack('C')
Out: 
C        small  large
A   B                
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0

您将分组列传递给 groupby，对于要显示为列值的列，您使用 unstack。

如果您不想要索引名称，请将它们重命名为 None：

df.groupby(['A', 'B', 'C'], sort=False)['D'].sum().rename_axis([None, None, None]).unstack(level=2)
Out: 
         small  large
foo one    1.0    4.0
    two    6.0    NaN
bar one    5.0    4.0
    two    6.0    7.0

【讨论】：

【解决方案2】：

在创建pivot_table 时，索引按字母顺序自动排序。不仅foo 和bar，您可能还会注意到small 和large 已排序。如果您想在顶部使用foo，您可能需要使用sortlevel 再次sort。如果您希望输出与example here 一样，则可能需要对A 和C 进行排序。

table.sortlevel(["A","B"], ascending= [False,True], sort_remaining=False, inplace=True)
table.sortlevel(["C"], axis=1, ascending=False,  sort_remaining=False, inplace=True)
print(table)

输出：

C        small  large
A   B                
foo one  1.0    4.0  
    two  6.0    NaN   
bar one  5.0    4.0  
    two  6.0    7.0

更新：

要删除索引名称A、B 和C：

table.columns.name = None
table.index.names = (None, None)

【讨论】：

如何从上述给定的解决方案中删除 C A B ？ small large foo 一 1 4 二 6 NaN bar 一 5 4 二 6 7
对于索引有多级，即你有A和B，所以你需要index.names。你可以看看stackoverflow.com/a/30254337/5916727。我看到你提到你是初学者，所以最好的办法是尝试。例如table.index、table.cloumns.name 返回的内容......
+1 用于（显然，事后看来）关于如何删除索引标签的建议（如果您使用 None 获得 NaN，只需使用空字符串）

【解决方案3】：

由于pandas 1.3.0，可以在pd.pivot_table中指定sort=False：

>>> import pandas as pd
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'],
...                aggfunc='sum', sort=False)
C        large  small
A   B                
foo one    4.0    1.0
    two    NaN    6.0
bar one    4.0    5.0
    two    7.0    6.0

【讨论】：