Pandas Dataframe：更改列、索引和绘图答案

【问题标题】：Pandas Dataframe: change columns, index and plotPandas Dataframe：更改列、索引和绘图
【发布时间】：2021-07-20 02:46:30
【问题描述】：

您好，我使用来自集合的 Counter 生成了上面的表格，用于计算数据框中 3 个变量的组合：Jessica、Mike 和 Dog。我得到了组合和它们的计数。有什么帮助让那张桌子更漂亮一点吗？我想将索引重命名为 grp1、grp2 等，并将列重命名为 0 以外的名称。此外，用于绘制不同组的最佳情节是什么？谢谢你的帮助！！

我使用这个命令在这里生成表格： df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3))

Z= pd.DataFrame(df,columns=['a', 'b', 'c'])

导入收藏

从集合导入计数器

LL= Z.apply (Counter, axis="columns").value_counts()

H= pd.DataFrame(LL)

打印（H）

【问题讨论】：

这个问题可能最好在这一步之前解决，所以你应该提供你的数据样本，我们可以用它来告诉你如何正确地聚合它。 stackoverflow.com/questions/20109391/… 给出了如何制作样本数据的例子。
我使用以下内容生成了上表：df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3)) Z= pd.DataFrame(df,columns=['a', 'b', 'c']) 然后在新的一行 import collections from collections import Counter LL= Z.apply (Counter, axis= "columns").value_counts () H= pd.DataFrame(LL) H
请编辑您的问题以包含此代码 sn-p。它在当前状态下不可移植。

标签： pandas dataframe matplotlib counter

【解决方案1】：

相当不寻常的技术....
您可以将 dict 索引更改为多索引
然后将 plot() 作为 barh 和标签才有意义

df= np.random.choice(["Mike", "Jessica", "Dog"], size=(20, 3)) 
Z= pd.DataFrame(df,columns=['a', 'b', 'c']) 
import collections 
from collections import Counter 
LL= Z.apply (Counter, axis= "columns").value_counts() 
H= pd.DataFrame(LL) 
I = pd.Series(H.index).apply(pd.Series)
H = H.set_index(pd.MultiIndex.from_arrays(I.T.values, names=I.columns))
H.plot(kind="barh")

设置为多索引后的H

                  0
Mike Dog Jessica   
2.0  1.0 NaN      5
     NaN 1.0      4
NaN  1.0 2.0      3
1.0  NaN 2.0      3
     1.0 1.0      2
NaN  NaN 3.0      1
     2.0 1.0      1
3.0  NaN NaN      1

【讨论】：

【解决方案2】：

您可以直接将 value_counts 应用于每一行，而不是使用计数器：

import pandas as pd
from matplotlib import pyplot as plt

# Hard Coded For Reproducibility
df = pd.DataFrame({'a': {0: 'Dog', 1: 'Jessica', 2: 'Mike',
                         3: 'Dog', 4: 'Dog', 5: 'Dog',
                         6: 'Jessica', 7: 'Jessica',
                         8: 'Dog', 9: 'Dog', 10: 'Jessica',
                         11: 'Mike', 12: 'Dog',
                         13: 'Jessica', 14: 'Mike',
                         15: 'Mike',
                         16: 'Mike', 17: 'Dog',
                         18: 'Jessica', 19: 'Mike'},
                   'b': {0: 'Mike', 1: 'Mike', 2: 'Jessica',
                         3: 'Jessica', 4: 'Dog', 5: 'Jessica',
                         6: 'Mike', 7: 'Dog', 8: 'Mike',
                         9: 'Dog', 10: 'Dog', 11: 'Dog',
                         12: 'Dog', 13: 'Jessica',
                         14: 'Jessica', 15: 'Dog',
                         16: 'Dog', 17: 'Dog', 18: 'Jessica', 19: 'Jessica'},
                   'c': {0: 'Mike', 1: 'Dog', 2: 'Jessica',
                         3: 'Dog', 4: 'Dog', 5: 'Dog', 6: 'Dog',
                         7: 'Jessica', 8: 'Mike', 9: 'Dog',
                         10: 'Dog', 11: 'Mike', 12: 'Jessica',
                         13: 'Jessica', 14: 'Jessica',
                         15: 'Jessica', 16: 'Jessica',
                         17: 'Dog', 18: 'Mike', 19: 'Dog'}})

# Apply value_counts across each row
df = df.apply(pd.value_counts, axis=1) \
    .fillna(0)

# Group By All Columns and
# Get Duplicate Count From Group Size
df = pd.DataFrame(df
                  .groupby(df.columns.values.tolist())
                  .size()
                  .sort_values())

# Plot
plt.figure()
df.plot(kind="barh")
plt.tight_layout()
plt.show()

groupby、大小和排序后的df：

                  0
Dog Jessica Mike   
0.0 3.0     0.0   1
1.0 2.0     0.0   1
0.0 2.0     1.0   3
1.0 0.0     2.0   3
3.0 0.0     0.0   3
2.0 1.0     0.0   4
1.0 1.0     1.0   5

Plt：

【讨论】：