如何按天分组每小时绘制数据？答案

【问题标题】：How to plot data per hour, grouped by days?如何按天分组每小时绘制数据？
【发布时间】：2019-10-15 12:41:44
【问题描述】：

背景：从大的DataFrame 中过滤出year=2013、month=June、第 3 周至第 9 周（周一至周日）的条目。然后，我按day、hour 和user_type 对数据进行分组，并旋转表格得到DataFrame，如下所示：

   Day  Hour  Casual  Registered  Casual_percentage
0  3    0     14      19          42.42
1  3    1     8       8           50.00
2  3    2     1       3           25.00
3  3    3     2       1           66.67
4  3    4     1       3           25.00
5  3    5     1       17          5.56
.  .    .     .       .           .

我每天都有 24 小时，所以对于第 4 天（星期二），数据开始如下：

.  .    .     .       .           .  
21 3    21    32      88          26.67
22 3    22    26      64          28.89
23 3    23    23      30          43.40
24 4    0     10      11          47.62
25 4    1     1       5           16.67
26 4    2     1       1           50.00
.  .    .     .       .           .

如何为每个Hour 绘制Casual 和Registered 变量，为7 个Days 中的每一个？我需要创建 7 个不同的图并将它们对齐在 1 个图中吗？

当前代码。我觉得我已经走投无路了。我还尝试使用documentation 创建一个second x-axis（用于Days）。

def make_patch_spines_invisible(ax):
    ax.set_frame_on(True)
    ax.patch.set_visible(False)
    for sp in ax.spines.values():
        sp.set_visible(False)

fig, ax1 = plt.subplots(figsize=(10, 5))
ax1.set(xlabel='Hours', ylabel='Total # of trips started')

ax1.plot(data.Hour, data.Casual, color='g')
ax1.plot(data.Hour, data.Registered, color='b')


"""This part is trying to create the 2nd x-axis (Days)"""
ax2 = ax1.twinx()
#offset the bottom spine
ax2.spines['bottom'].set_position(('axes', -.5))
make_patch_spines_invisible(ax2)
#show bottomm spine
ax2.spines['bottom'].set_visible(True)
ax2.set_xlabel("Days")


plt.show()

输出：

End goal

【问题讨论】：

标签： python pandas matplotlib

【解决方案1】：

我认为如果您处理 datetime 对象而不是 Day、Hour 字符串，这应该会更容易。
这样，您就可以使用date tick locators and formatters 连同major and minor ticks。

即使你没有提到它，我认为你可以使用 pandas 来处理数据帧。
我通过复制您提供的多次数据并剪切其中一些数据来创建一个新的数据框（这不是那么重要）。
在这里，我根据您提供的信息重建了日期，但我建议直接处理它们（我想原始数据框中有某种类似日期的字段）。

import pandas as pd
import matplotlib.pyplot as plt 
import matplotlib.dates as mdates

df = pd.read_csv("mydataframe.csv")
df["timestamp"] = "2013-06-" + df["Day"].astype(str).str.zfill(2) + "-" + df["Hour"].astype(str).str.zfill(2)
df["timestamp"] = pd.to_datetime(df["timestamp"], format="%Y-%m-%d-%H")


fig, ax1 = plt.subplots(figsize=(10, 5))
ax1.set(xlabel='', ylabel='Total # of trips started')
ax1.plot(df["timestamp"], df.Casual, color='g')
ax1.plot(df["timestamp"], df.Registered, color='b')

ax1.xaxis.set(
    major_locator=mdates.DayLocator(),
    major_formatter=mdates.DateFormatter("\n\n%A"),
    minor_locator=mdates.HourLocator((0, 12)),
    minor_formatter=mdates.DateFormatter("%H"),
)
plt.show()

输出：

【讨论】：

是的，DataFrame（已保存 '13 June 3-9 的值）有一个 datetime 变量 s_datetime，我做到了：data = df.groupby([df['s_timedate'].dt.day, df['s_timedate'].dt.hour, df.user_type]).agg({'hubway_id':'count'}) 以获取每天和每小时的计数。但如果我理解正确，我应该得到保持 datetime 变量具有列的计数：date_time、Casual、Registered？
@Bn.F76 是的，您应该保留 datetime 变量。为了实现时间分组，您还可以使用 pandas resample。例如看这个问题stackoverflow.com/questions/49344899/…
谢谢！最后一件事，mdates.DateFormatter("\n\n%A") 是什么意思？我知道%H 得到小时，%Y 年等，但我不明白你的表达，我想玩，所以我在 24 小时窗口内每小时得到滴答声
@Bn.F76 in "\n\n%A", \n\n 只需将工作日放在小时刻度下面的两行，而 %A 得到“工作日作为区域设置的全名”。看strftime.org

【解决方案2】：

假设您的数据按索引排序（例如，0 - 24 是第 3 天，25 - 48 是第 4 天等），您可以在代码中绘制索引值而不是小时数：

ax1.plot(data.index.values, df.Casual, color='g')
ax1.plot(data.index.values, df.Registered, color='b')

这将产生一个类似于您正在寻找的最终产品的图表（注意我使用了假数据）：

【讨论】：

不幸的是，有几个星期没有特定时间的数据，因此索引方法不准确/不可靠。
那么是的，您需要重新格式化您的数据作为提到的其他答案。处理小时、天、月等的单独列是不行的，您应该将它们连接到一个日期时间对象中。