您需要将格式字符串作为to_datetime 的参数传递:
In[20]:
series.index = pd.to_datetime(series.index, format='%d-%m')
series.index
Out[20]:
DatetimeIndex(['1900-01-01', '1900-02-01', '1900-03-01', '1900-04-01',
'1900-05-01', '1900-06-01', '1900-07-01', '1900-08-01',
'1900-09-01', '1900-10-01', '1900-11-01', '1900-12-01',
'1900-01-02', '1900-02-02', '1900-03-02', '1900-04-02',
'1900-05-02', '1900-06-02', '1900-07-02', '1900-08-02',
'1900-09-02', '1900-10-02', '1900-11-02', '1900-12-02',
'1900-01-03', '1900-02-03', '1900-03-03', '1900-04-03',
'1900-05-03', '1900-06-03', '1900-07-03', '1900-08-03',
'1900-09-03', '1900-10-03', '1900-11-03', '1900-12-03'],
dtype='datetime64[ns]', name='Month', freq=None)
默认情况下,它会尝试推断格式,并认为格式为YYYY-MM-DD,因此字符串01-01 转换为第1 年第1 个月,超出纳秒范围
如果您想要一个单调递增的索引,也就是您的数据实际的样子,我们可以将字符串 '20' 预先添加到索引中,然后进行转换:
In[24]:
series.index = '20' + series.index
series.index
Out[24]:
Index(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06',
'2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12',
'2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06',
'2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12',
'2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06',
'2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12'],
dtype='object')
In[25]:
series.index = pd.to_datetime(series.index, format='%Y-%m')
series
Out[25]:
2001-01-01 266.0
2001-02-01 145.9
2001-03-01 183.1
2001-04-01 119.3
2001-05-01 180.3
2001-06-01 168.5
2001-07-01 231.8
2001-08-01 224.5
2001-09-01 192.8
2001-10-01 122.9
2001-11-01 336.5
2001-12-01 185.9
2002-01-01 194.3
2002-02-01 149.5
2002-03-01 210.1
2002-04-01 273.3
2002-05-01 191.4
2002-06-01 287.0
2002-07-01 226.0
2002-08-01 303.6
2002-09-01 289.9
2002-10-01 421.6
2002-11-01 264.5
2002-12-01 342.3
2003-01-01 339.7
2003-02-01 440.4
2003-03-01 315.9
2003-04-01 439.3
2003-05-01 401.3
2003-06-01 437.4
2003-07-01 575.5
2003-08-01 407.6
2003-09-01 682.0
2003-10-01 475.3
2003-11-01 581.3
2003-12-01 646.9
那么你的代码就可以工作了:
In[28]:
X = series.rename("actual").to_frame()
X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill')
X
Out[28]:
actual
2001-01-01 266.0
2001-01-02 266.0
2001-01-03 266.0
2001-01-04 266.0
2001-01-05 266.0
2001-01-06 266.0
2001-01-07 266.0
2001-01-08 266.0
2001-01-09 266.0
2001-01-10 266.0
2001-01-11 266.0
2001-01-12 266.0
2001-01-13 266.0
2001-01-14 266.0
2001-01-15 266.0
2001-01-16 266.0
2001-01-17 266.0
2001-01-18 266.0
2001-01-19 266.0
2001-01-20 266.0
2001-01-21 266.0
2001-01-22 266.0
2001-01-23 266.0
2001-01-24 266.0
2001-01-25 266.0
2001-01-26 266.0
2001-01-27 266.0
2001-01-28 266.0
2001-01-29 266.0
2001-01-30 266.0
...
2003-11-02 581.3
2003-11-03 581.3
2003-11-04 581.3
2003-11-05 581.3
2003-11-06 581.3
2003-11-07 581.3
2003-11-08 581.3
2003-11-09 581.3
2003-11-10 581.3
2003-11-11 581.3
2003-11-12 581.3
2003-11-13 581.3
2003-11-14 581.3
2003-11-15 581.3
2003-11-16 581.3
2003-11-17 581.3
2003-11-18 581.3
2003-11-19 581.3
2003-11-20 581.3
2003-11-21 581.3
2003-11-22 581.3
2003-11-23 581.3
2003-11-24 581.3
2003-11-25 581.3
2003-11-26 581.3
2003-11-27 581.3
2003-11-28 581.3
2003-11-29 581.3
2003-11-30 581.3
2003-12-01 646.9
[1065 rows x 1 columns]