【发布时间】:2020-04-09 23:53:28
【问题描述】:
我正在研究 Metro Interstate Traffic Volume 数据集(可在此处获得:http://archive.ics.uci.edu/ml/datasets/Metro+Interstate+Traffic+Volume),但我无法重新采样数据集以显示每天的平均交通量,而不是每小时显示。
metro = pd.read_csv('Metro_Interstate_Traffic_Volume.csv')
metro['date_time'] = pd.to_datetime(metro['date_time'], format='%Y-%m-%d %H:%M:%S')
metro.set_index('date_time', inplace=True, drop=True)
metro.resample('1Y').mean()
这是我得到的:
holiday temp ... weather_description traffic_volume
date_time ...
2012-10-02 09:00:00 None 288.28 ... scattered clouds 5545
2012-10-02 10:00:00 None 289.36 ... broken clouds 4516
2012-10-02 11:00:00 None 289.58 ... overcast clouds 4767
2012-10-02 12:00:00 None 290.13 ... overcast clouds 5026
2012-10-02 13:00:00 None 291.14 ... broken clouds 4918
... ... ... ... ... ...
2018-09-30 19:00:00 None 283.45 ... broken clouds 3543
2018-09-30 20:00:00 None 282.76 ... overcast clouds 2781
2018-09-30 21:00:00 None 282.73 ... proximity thunderstorm 2159
2018-09-30 22:00:00 None 282.09 ... overcast clouds 1450
2018-09-30 23:00:00 None 282.12 ... overcast clouds 954
[48204 rows x 8 columns]
你知道如何解决它吗?
编辑:
另外,我检查了 pandas 的重采样参考 (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) 并执行了以下示例代码:
d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018',
periods=8,
freq='W')
df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
但是,对我来说,重采样前后的结果是一样的。
【问题讨论】: