pandas / matplotlib：刻面条形图答案

【问题标题】：pandas / matplotlib: faceting bar plotspandas / matplotlib：刻面条形图
【发布时间】：2013-10-27 00:44:09
【问题描述】：

我正在制作一系列带有两个分类变量和一个数字的数据条形图。我所拥有的是以下内容，但我想做的是通过facet_wrap 中的facet_wrap 中的一个分类变量来分面。我有一个可行的示例，但是我得到了错误的绘图类型（线条而不是条形），并且我在循环中对数据进行了子集化——这不是最好的方法。

## first try--plain vanilla
import pandas as pd
import numpy as np
N = 100

## generate toy data
ind = np.random.choice(['a','b','c'], N)
cty = np.random.choice(['x','y','z'], N)
jobs = np.random.randint(low=1,high=250,size=N)

## prep data frame
df_city = pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})
df_city_grouped = df_city.groupby(['city','industry']).jobs.sum().unstack()
df_city_grouped.plot(kind='bar',stacked=True,figsize=(9, 6))

这给出了这样的结果：

  city industry  jobs
0    z        b   180
1    z        c   121
2    x        a    33
3    z        a   121
4    z        c   236

但是，我希望看到的是这样的：

## R code
library(plyr)
df_city<-read.csv('/home/aksel/Downloads/mockcity.csv',sep='\t')

## summarize
df_city_grouped <- ddply(df_city, .(city,industry), summarise, jobstot = sum(jobs))

## plot
ggplot(df_city_grouped, aes(x=industry, y=jobstot)) +
  geom_bar(stat='identity') +
  facet_wrap(~city)

我用 matplotlib 得到的最接近的是这样的：

cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))

for x, city in enumerate(df_city.city.value_counts().index.values):
    data = df_city[(df_city['city'] == city)]
    data = data.groupby(['industry']).jobs.sum()
    axes[x].plot(data)

所以两个问题：

我能否使用 AxesSubplot 对象绘制条形图（它们绘制线如图所示）并最终得到类似于来自 ggplot 示例的 facet_wrap 示例的线条；
在循环生成图表（如本次尝试）中，我对每个中的数据进行了子集化。我无法想象这是进行这种刻面的“正确”方式？

【问题讨论】：

为什么不在循环中使用bar？
@tcaswell，好建议。绘制条形图的技巧是什么？两个参数似乎都需要作为数字。先转换分类变量？有没有更规范的方式？
见matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.bar
很抱歉回答了一个老问题，但我是通过谷歌搜索来到这里的。现在您可以使用 seaborn.FacetGrid 参见文档seaborn.pydata.org/generated/seaborn.FacetGrid.html 或本教程seaborn.pydata.org/tutorial/axis_grids.html

标签： python r matplotlib ggplot2 pandas

【解决方案1】：

这里的第二个例子：http://pandas-docs.github.io/pandas-docs-travis/visualization.html#bar-plots

无论如何，您始终可以手动完成，就像您自己做的那样。

编辑：顺便说一句，你总是可以在 python 中使用 rpy2，所以你可以做所有与 R 相同的事情。

另外，看看这个：https://pandas.pydata.org/pandas-docs/version/0.14.1/rplot.html 我不确定，但它应该有助于在多个面板上创建图，尽管可能需要进一步阅读。

【讨论】：

我认为这是可行的，但我真的很喜欢它们在单独的面板中，就像 ggplot 所做的那样，特别是随着更多维度的添加而变得灵活。而且这个例子当然不喜欢数据框变量是非数字的。有解决这个问题的好方法吗？
这看起来就像我正在寻找的开箱即用的东西——无需在每个循环中对数据进行子集化。剩下的就是让我了解在图中绘制分类变量的“正确”方式。

【解决方案2】：

@tcasell 建议在循环中调用bar。这是一个有效的示例，如果不优雅的话。

## second try--facet by county

N = 100
industry = ['a','b','c']
city = ['x','y','z']
ind = np.random.choice(industry, N)
cty = np.random.choice(city, N)
jobs = np.random.randint(low=1,high=250,size=N)
df_city =pd.DataFrame({'industry':ind,'city':cty,'jobs':jobs})

## how many panels do we need?
cols =df_city.city.value_counts().shape[0]
fig, axes = plt.subplots(1, cols, figsize=(8, 8))

for x, city in enumerate(df_city.city.value_counts().index.values):
    data = df_city[(df_city['city'] == city)]
    data = data.groupby(['industry']).jobs.sum()
    print (data)
    print type(data.index)
    left=  [k[0] for k in enumerate(data)]
    right=  [k[1] for k in enumerate(data)]

    axes[x].bar(left,right,label="%s" % (city))
    axes[x].set_xticks(left, minor=False)
    axes[x].set_xticklabels(data.index.values)

    axes[x].legend(loc='best')
    axes[x].grid(True)
    fig.suptitle('Employment By Industry By City', fontsize=20)

【讨论】：

1 个细微的区别是这里所有 3 个地块都有不同的比例，这使得跨面板比较变得困难
这很容易通过将sharey 添加到subplots 函数中来解决，如下所示：plt.subplots(1, cols, figsize=(8, 8), sharey=True)

【解决方案3】：

Seaborn library，它建立在 Matplotlib 之上，可以被认为是它的超集，具有灵活而强大的平面图绘图选项——它们甚至使用与 R 相似的术语。向下滚动 this page 以获取多个示例.

【讨论】：