使用熊猫数据框按月分组计算月均值答案

【问题标题】：using pandas dataframe to groupby month calculate the mean monthly value使用熊猫数据框按月分组计算月均值
【发布时间】：2019-08-14 19:48:21
【问题描述】：

我正在尝试计算我的 pandas 数据框中的平均每月误差值：

df

输出：

    timestamp    error
0   2016-01-01  1.402315
1   2016-01-02  2.100072
2   2016-01-03  2.127035
3   2016-01-04  1.371335
..         ...       ...
362 2016-12-28  2.189454
363 2016-12-29  1.411169

我想计算每月的误差值，所以我尝试：

df=df.groupby(df['timestamp'].dt.strftime('%B'))['error'].mean().sort_values()

这给了我输出：

timestamp
March        1.505680
July         1.728256
June         1.760756
August       1.796719
September    1.830922
October      1.922697
February     2.021558
May          2.100005
December     2.143730
April        2.316473
November     2.491638
January      2.551331

我似乎无法对这些进行排序，因此它们按 jan、feb、mar、april 等顺序排列，以便在绘图上，xaxis 上的月份顺序正确。有什么想法吗？

【问题讨论】：

标签： python pandas sorting datetime group-by

【解决方案1】：

您可以使用以下内容：

# Print small dataframe which was provided
print(df)
      timestamp     error
0    2016-01-01  1.402315
1    2016-01-02  2.100072
2    2016-01-03  2.127035
3    2016-01-04  1.371335
362  2016-12-28  2.189454
363  2016-12-29  1.411169

# Convert month to monthnumbers
df['timestamp'] = pd.to_datetime(df.timestamp).dt.month

df = df.groupby('timestamp')['error'].mean().sort_index().reset_index()
print(df)
   timestamp     error
0          1  1.750189
1         12  1.800312

【讨论】：

【解决方案2】：

Pandas 对此具有内置功能，使其更简单（另请参阅 How can I Group By Month from a Date field using Python/Pandas）

你可以这样做

df.groupby(df['timestamp'].dt.month).mean()

它将按月计算平均值，并将索引作为月份编号，因此它们将在一月 -> 十二月进行排序。

【讨论】：