为什么 python 无法识别数据集中的月份列？答案

【问题标题】：Why can't python recognize my month column in a dataset?为什么 python 无法识别数据集中的月份列？
【发布时间】：2016-12-20 15:42:14
【问题描述】：

这是数据框的样子：

Date  Time (HHMM)         Site  Plot  Replicate  Temperature  \
0   2002-05-01          600  Barre Woods    16          5          4.5
1   2002-05-01          600  Barre Woods    21          7          4.5
2   2002-05-01          600  Barre Woods    31          9          6.5
3   2002-05-01          600  Barre Woods    10          2          5.3
4   2002-05-01          600  Barre Woods     2          1          4.0
5   2002-05-01          600  Barre Woods    13          4          5.5
6   2002-05-01          600  Barre Woods    11          3          5.0
7   2002-05-01          600  Barre Woods    28          8          5.0
8   2002-05-01          600  Barre Woods    18          6          4.5
9   2002-05-01         1400  Barre Woods     2          1         10.3
10  2002-05-01         1400  Barre Woods    31          9          9.0
11  2002-05-01         1400  Barre Woods    13          4         11.0
import pandas as pd
import datetime as dt
from datetime import datetime
df=pd.read_csv('F:/data32.csv',parse_dates=['Date'])
df['Date']=pd.to_datetime(df['Date'],format='%m/%d/%y')

这是我得到错误的地方

df2=df.groupby(pd.TimeGrouper(freq='M'))

错误显示：

仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效，但得到 'RangeIndex' 的一个实例

【问题讨论】：

请将您的示例数据发布为文本 - 而不是图像 - 这使得任何人都很难实际测试和运行带有示例的东西以提供答案。无论如何，您想要的确切结果是什么？
@JonClements 谢谢。我修好了它。我想将数据分成几个月，这样我就可以将我的数据框分成两部分：一部分包含月份：(Jan,Feb,Mar,Oct,Dec)，另一部分包含所有其他数据集
pandas dataframe groupby datetime month的可能重复
我尝试使用它作为参考，但我不断收到错误@Merlin
那么你想在groupby上使用什么功能？

标签： python datetime pandas group-by

【解决方案1】：

按df['Date'].dt.month 分组。例如，要计算平均温度，您可以执行以下操作。

import io
import pandas as pd

data = io.StringIO('''\
Date,Time (HHMM),Site,Plot,Replicate,Temperature
0,2002-05-01,600,Barre Woods,16,5,4.5
1,2002-05-01,600,Barre Woods,21,7,4.5
2,2002-05-01,600,Barre Woods,31,9,6.5
3,2002-05-01,600,Barre Woods,10,2,5.3
4,2002-05-01,600,Barre Woods,2,1,4.0
5,2002-05-01,600,Barre Woods,13,4,5.5
6,2002-05-01,600,Barre Woods,11,3,5.0
7,2002-05-01,600,Barre Woods,28,8,5.0
8,2002-05-01,600,Barre Woods,18,6,4.5
9,2002-05-01,1400,Barre Woods,2,1,10.3
10,2002-05-01,1400,Barre Woods,31,9,9.0
11,2002-05-01,1400,Barre Woods,13,4,11.0
''')

df = pd.read_csv(data)
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

df.groupby(df['Date'].dt.month)['Temperature'].mean()

输出：

Date
5    6.258333
Name: Temperature, dtype: float64

【讨论】：

【解决方案2】：

你可以先使用set_index：

dfx = df.set_index('Date')

那你可以groupby:

dfx.groupby(lambda x : x.month).mean() #just for an example I am using .mean()

【讨论】：

没有Date Time 列，所以df.set_index('Date Time') 引发KeyError。