Python：在一张图中绘制所有分类子集组合答案

【问题标题】：Python: Plot all categorical subset combinations in one figurePython：在一张图中绘制所有分类子集组合
【发布时间】：2018-12-23 11:58:29
【问题描述】：

我想生成图表，显示不同组和子组的平均速率随时间的变化情况。我可以手动完成，创建每个分组，定义每个 y 值集，并手动调用每个图。问题是，对组/子组的所有组合都这样做是不切实际的。但是我不确定如何概括该过程。

我的数据有一个year、几个分类变量和一个数字rate。它看起来像这样，虽然实际上有更多的分类变量：

df.head()
Out [33]:
   year gender   race state  rate
0  2015      F  White    AL  0.01
1  2013      F  White    NC  0.48
2  2013      F  White    IN  0.07
3  2013      M  White    NJ  0.95
4  2013      F  White    NY  0.09

我想覆盖各个组和子组：

有没有更优雅的方法来子集/自动生成这个/这些图？

import pandas as pd
import matplotlib.pyplot as plt

raw_data = {'year' : [2015 , 2013 , 2013 , 2013 , 2013 , 2013 , 2014 , 2013 , 2013 , 2013 , 2017 , 2013 , 2016 , 2017 , 2016 , 2015 , 2014 , 2014 , 2013 , 2013 , 2017 , 2014 , 2013 , 2016 , 2014 , 2016 , 2015 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2013 , 2017 , 2015 , 2015 , 2013 , 2013 , 2014]

, 'gender' : ['F' , 'F' , 'F' , 'M' , 'F' , 'F' , 'F' , 'M' , 'F' , 'M' , 'F' , 'M' , 'F' , 'M' , 'M' , 'M' , 'M' , 'M' , 'M' , 'M' , 'F' , 'M' , 'F' , 'M' , 'M' , 'M' , 'F' , 'M' , 'F' , 'F' , 'F' , 'M' , 'F' , 'M' , 'F' , 'F' , 'F' , 'F' , 'M' , 'M' , 'M' , 'F' , 'M' , 'M' , 'F' , 'M' , 'F' , 'M' , 'F']

, 'race' : ['White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'Black' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'Black' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'White' , 'Hispanic' , 'White' , 'Multiple' , 'White' , 'White' , 'Black' , 'Asian/Hawaii/PI' , 'Asian/Hawaii/PI' , 'Black' , 'Black' , 'Black' , 'Hispanic' , 'Black' , 'Black' , 'Black' , 'Black' , 'White' , 'White' , 'White' , 'White' , 'Black' , 'Multiple' , 'White' , 'White' , 'Black']

, 'state' : ['AL' , 'NC' , 'IN' , 'NJ' , 'NY' , 'NY' , 'NY' , 'ME' , 'MD' , 'NC' , 'NC' , 'NC' , 'AL' , 'IN' , 'MD' , 'MD' , 'ME' , 'IN' , 'AL' , 'NC' , 'IN' , 'NJ' , 'NY' , 'AL' , 'IN' , 'MD' , 'MD' , 'ME' , 'IN' , 'AL' , 'NC' , 'IN' , 'NJ' , 'ME' , 'MD' , 'NC' , 'NC' , 'NC' , 'AL' , 'IN' , 'MD' , 'ME' , 'MD' , 'NC' , 'NC' , 'NC' , 'AL' , 'IN' , 'MD']

, 'rate' : [0.01 , 0.48 , 0.07 , 0.95 , 0.09 , 0.09 , 0.08 , 0.89 , 0.55 , 0.38 , 0.23 , 0.66 , 0.46 , 0.24 , 0.07 , 0.75 , 0.67 , 0.60 , 0.36 , 0.18 , 0.56 , 0.27 , 0.98 , 0.89 , 0.17 , 0.72 , 0.23 , 0.10 , 0.81 , 0.04 , 0.41 , 0.16 , 0.39 , 0.12 , 0.95 , 0.99 , 0.16 , 0.52 , 0.74 , 0.31 , 0.36 , 0.16 , 0.02 , 0.22 , 0.33 , 0.30 , 0.90 , 0.14 , 0.16]}

df = pd.DataFrame(raw_data, columns= ['year', 'gender', 'race', 'state', 'rate'])

gb_overall = df.groupby(['year'])['rate'].mean()
gb_gender = df.groupby(['year', 'gender'])['rate'].mean()
gb_gender_race = df.groupby(['year', 'gender', 'race'])['rate'].mean()

x = gb_overall.index

y_overall = gb_overall.values
y_f = gb_gender.xs('F', level=1)
y_m = gb_gender.xs('M', level=1)
y_f_r = gb_gender_race.xs(('F', 'White'), level=['gender', 'race'])

fig = plt.subplots(figsize=(12, 8))
plt.plot(x, y_overall, marker = 'o')
plt.plot(x, y_f, marker = 'o')
plt.plot(x, y_m, marker = 'o')
plt.plot(x, y_f_r, marker = 'o')

axes = plt.gca()
axes.set_xlim(left=2012.5)
axes.set_title('Year vs. Average Rate', fontsize= 24)
axes.set_xlabel('Year', fontsize= 16)
axes.set_ylabel('Average Rate', fontsize= 16)
axes.legend(['Overall', 'F', 'M', 'White F'], fontsize=14, loc= 'best', frameon= True, edgecolor= 'black')

plt.show()

【问题讨论】：

标签： python pandas matplotlib automation

【解决方案1】：

我喜欢为使用 pandas 绘图构建一个形状和组织的单一数据框。

white_f = gb_gender_race.xs(('F','White'), level=['gender','race']).rename('White F')
gender = gb_gender.unstack()
overall = gb_overall.rename('Overall')
df_chart = pd.concat([white_f, gender, overall], axis=1)
axes = df_chart.plot(marker = 'o')
axes.set_xlim(left=2012.5)
axes.set_title('Year vs. Average Rate', fontsize= 24)
axes.set_xlabel('Year', fontsize= 16)
axes.set_ylabel('Average Rate', fontsize= 16)
axes.legend(['Overall', 'F', 'M', 'White F'], fontsize=14, loc= 'best', frameon= True, edgecolor= 'black')

输出：

【讨论】：

【解决方案2】：

使用字典，您可以在各种条件下半自动绘图。我省略了代码中的图形样式部分以专注于基本要素。

# overall plot
df.groupby('year').rate.mean().plot(label='Overall', marker='o')

# a dictionary to store various labels(keys) and conditions(values).
# by editing/adding conditions, you can customise your plots.
conds = {}
conds['F'] = (df.gender == 'F')
conds['M'] = (df.gender == 'M')
conds['White F'] = (df.gender == 'F') & (df.race == 'White')

# plot for each condition
for key, value in conds.items():
    df.loc[value].groupby('year').rate.mean().plot(label=key, marker='o')
plt.legend()
plt.show()

【讨论】：