使用 Matplotlib 绘制数据子集答案

【问题标题】：Using Matplotlib to plot over a subset of data使用 Matplotlib 绘制数据子集
【发布时间】：2017-01-03 12:34:41
【问题描述】：

我正在使用 matplotlib 在我的 DataFrame 中绘制数据条形图。我使用这种结构首先绘制整个数据集：

import pandas as pd
from collections import Counter
import matplotlib.pyplot as plt 

Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )

现在我想绘制同一列数据，但我想在特定的数据子集上这样做。对于“region_name”中的每个区域，我想生成条形图。这是我的 DataFrame 的示例。

我尝试的解决方案是这样写：

if weatherDFConcat['REGION_NAME'].any() == 'South':
    Temp_Counts = Counter(weatherDFConcat['TEMPBIN_CONS'])
    df = pd.DataFrame.from_dict(Temp_Counts, orient = 'index').sort_index()
    df.plot(kind = 'bar', title = '1969-2015 National Temp Bins', legend = False, color = ['r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g', 'b', 'b','r', 'r', 'g', 'g', 'b', 'b', 'r', 'r', 'g', 'g' ] )
    plt.show()

当我运行这段代码时，奇怪的是它只适用于“南部”地区。对于“南方”，该图已生成，但对于任何其他区域，我尝试运行代码（我没有收到错误消息），但该图从未出现。对除南部以外的任何区域运行我的代码都会在控制台中产生此结果。

South 区域是我的 DataFrame 中的第一部分，它有 4000 万行长，其他区域更靠后。我尝试绘制的 DataFrame 的大小与此有关吗？

【问题讨论】：

您是否尝试使用具有其他名称的比较表达式将一个区域提取到另一个数据帧？这行得通吗？

标签： python pandas numpy matplotlib plot

【解决方案1】：

如果我正确理解了您的问题，那么您正在尝试在绘图之前做两件事：

过滤基于REGION_NAME。
在过滤后的数据框中，计算TEMPBIN_CONS 列中每个值出现的次数。

你可以在 pandas 中做这两件事：

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'STATE_NAME': ['Alabama', 'Florida', 'Maine', 'Delaware', 'New Jersey'],
                        'GEOID': [1, 2, 3, 4, 5],
                 'TEMPBIN_CONS': ['-3 to 0', '-3 to 0', '0 to 3', '-3 to 0', '0 to 3'],
                  'REGION_NAME': ['South', 'South', 'Northeast', 'Northeast', 'Northeast']},
                         columns=['STATE_NAME', 'GEOID', 'TEMPBIN_CONS', 'REGION_NAME'])

df_northeast = df[df['REGION_NAME'] == 'Northeast']
northeast_count = df_northeast.groupby('TEMPBIN_CONS').size()

print df
print df_northeast
print northeast_count

northeast_count.plot(kind='bar')
plt.show()

输出：

   STATE_NAME  GEOID TEMPBIN_CONS REGION_NAME
0     Alabama      1      -3 to 0       South
1     Florida      2      -3 to 0       South
2       Maine      3       0 to 3   Northeast
3    Delaware      4      -3 to 0   Northeast
4  New Jersey      5       0 to 3   Northeast

   STATE_NAME  GEOID TEMPBIN_CONS REGION_NAME
2       Maine      3       0 to 3   Northeast
3    Delaware      4      -3 to 0   Northeast
4  New Jersey      5       0 to 3   Northeast

TEMPBIN_CONS
-3 to 0    1
0 to 3     2
dtype: int64

【讨论】：

非常感谢 - 简单的解决方案，完美运行。我刚开始编程，非常感谢。