在python中的时间序列数据框中绘制排序的工作日/月答案

【问题标题】：plot the sorted weekdays/month on timeseries dataframe in python在python中的时间序列数据框中绘制排序的工作日/月
【发布时间】：2021-10-22 05:58:49
【问题描述】：

我有一年的交通数据存储在一个数据框中。

study time	volume	month	day	year	weekday	week_of_year
2019-01-01 00:00:00	25	January	Tuesday	2019	1	1
2019-01-01 00:00:15	25	January	Tuesday	2019	1	1
2019-01-01 00:00:30	21	January	Tuesday	2019	1	1
2019-01-02 00:00:00	100	January	Wednesday	2019	2	1
2019-01-02 00:00:15	2	January	Wednesday	2019	2	1
2019-01-02 00:00:30	50	January	Wednesday	2019	2	1

我想查看体积数据的每小时、每天、每周和每月模式。我是用这个脚本来做的：

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
plt.axes(ax[0,0])

countData19_gdf.groupby(['hour','address']).mean().groupby(['hour'])['volume'].mean().plot(x='hour',y='volume')
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[0,1])
countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

plt.axes(ax[1,0])
countData19_gdf.groupby(['week_of_year','address']).mean().groupby(['week_of_year'])['volume'].mean().plot(x='week_of_year',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[1,1])
countData19_gdf.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().plot(x='month',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

ax[0,0].title.set_text('Hourly')
ax[0,1].title.set_text('Daily')
ax[1,0].title.set_text('Weekly')
ax[1,1].title.set_text('Monthly')

plt.savefig('temporal_global.png')

结果看起来像这样，其中工作日是或月份没有排序。

能否请您帮我解决如何对它们进行排序？我尝试将天数排序为整数，但它不起作用。

【问题讨论】：

让数据不仅仅是一张图片会很有帮助。
希望对你有帮助
我确实尝试过，但它会返回一个 AttributeError AttributeError: 'DataFrame' object has no attribute 'sort_value' 例如months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] plt.axes(ax[1,1]) df1['month']= pd.Categorical(df1['month'], categories=months, ordered=True) df1.sort_value(by="month") df1.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().plot(x='month',y='volume', rot=90) plt.ylabel("Total averge counts of the stations")
这是一个方法，应该是sort_values而不是sort_value
天啊！谢谢。我修复了它，它不再抛出错误，但情节是空的！

标签： python pandas matplotlib time-series

【解决方案1】：

groupby 方法将自动对索引进行排序，但是对于字符串值，这意味着按字母顺序排序（而不是按例如工作日的顺序）。

您可以做的是使用reindex 方法让索引按您想要的顺序排列。例如：

countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']).plot(x='day',y='volume')

注意：

如果索引中的值不在reindex 方法中指定的值列表中，则不会包含该行。同样，如果该列表中有一个新值，而该新值不存在于索引中，则会导致将NaN 值分配给该新索引。因此，如果您的countData19_gdf 没有day，例如Monday，它将出现在重新索引的df 中，但值将设置为NaN。

编辑：

由于您已经有工作日的数值（您可能希望几个月都获得相同的数值），为避免手动指定新索引，您可以通过以下方式获得排序的字符串值：

countData19_gdf.sort_values(by = 'weekday')['day'].unique()

快速示例（我更改了给定数据中的一些“天”值以显示问题）：

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

输出：

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Tuesday','Wednesday','Friday']).plot(x='day',y='volume')

输出：

【讨论】：

如果你相信 panda 的区域设置 pd.date_range(pd.Timestamp(1970, 1, 5), freq='d', periods=7).strftime('%A') 也会给你排序的日期列表。与(…, freq='m', periods=12).strftime('%B') 类似几个月。
@Cimbali 这是个好主意。
感谢您提供完整的答案和示例。我确实尝试过，它在工作日有效，但是这个月没有显示任何情节。 countData19_gdf.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().reindex(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']).plot(x='month',y='volume')
@Raha1986 您在 DataFrame 中的月份值是完整的月份名称（'January'、'February'、...），但您使用缩短的名称（'Jan'、'Feb'、.. ..)。这就是我提到的关于导致NaN 值的新值。由于您的 DataFrame 中不存在任何缩短的名称（特别是“月”列），因此只会导致缺失值。