【问题标题】:Out of bounds nanosecond timestamp: 1-01-01 00:00:00越界纳秒时间戳:1-01-01 00:00:00
【发布时间】:2019-12-01 15:25:15
【问题描述】:

我使用以下代码从github 导入数据:

series = read_csv('shampoo-sales.csv', header=0, index_col=0, squeeze=True). 

我想把它的索引改为datetimeindex。我用

series.index = pd.to_datetime(series.index).

但是python给了我以下错误:

Out of bounds nanosecond timestamp: 1-01-01 00:00:00

我不知道如何解决这个错误。

series = read_csv('shampoo-sales.csv',header=0,index_col=0,squeeze=True)

series.index = pd.to_datetime(series.index)


更新:感谢 EdChum 指出一种从索引转换为日期时间索引的方法。但是,我现在遇到了另一个问题。考虑以下代码。

X = series.rename("actual").to_frame() 
X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill') 

现在我让 X = series,它返回一个错误,指出索引必须是单调递增或递减的。

【问题讨论】:

    标签: python pandas datetime type-conversion


    【解决方案1】:

    您需要将格式字符串作为to_datetime 的参数传递:

    In[20]:
    series.index = pd.to_datetime(series.index, format='%d-%m')
    series.index
    
    Out[20]: 
    DatetimeIndex(['1900-01-01', '1900-02-01', '1900-03-01', '1900-04-01',
                   '1900-05-01', '1900-06-01', '1900-07-01', '1900-08-01',
                   '1900-09-01', '1900-10-01', '1900-11-01', '1900-12-01',
                   '1900-01-02', '1900-02-02', '1900-03-02', '1900-04-02',
                   '1900-05-02', '1900-06-02', '1900-07-02', '1900-08-02',
                   '1900-09-02', '1900-10-02', '1900-11-02', '1900-12-02',
                   '1900-01-03', '1900-02-03', '1900-03-03', '1900-04-03',
                   '1900-05-03', '1900-06-03', '1900-07-03', '1900-08-03',
                   '1900-09-03', '1900-10-03', '1900-11-03', '1900-12-03'],
                  dtype='datetime64[ns]', name='Month', freq=None)
    

    默认情况下,它会尝试推断格式,并认为格式为YYYY-MM-DD,因此字符串01-01 转换为第1 年第1 个月,超出纳秒范围

    如果您想要一个单调递增的索引,也就是您的数据实际的样子,我们可以将字符串 '20' 预先添加到索引中,然后进行转换:

    In[24]:
    series.index = '20' + series.index
    series.index
    
    Out[24]: 
    Index(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06',
           '2001-07', '2001-08', '2001-09', '2001-10', '2001-11', '2001-12',
           '2002-01', '2002-02', '2002-03', '2002-04', '2002-05', '2002-06',
           '2002-07', '2002-08', '2002-09', '2002-10', '2002-11', '2002-12',
           '2003-01', '2003-02', '2003-03', '2003-04', '2003-05', '2003-06',
           '2003-07', '2003-08', '2003-09', '2003-10', '2003-11', '2003-12'],
          dtype='object')
    
    In[25]:
    series.index = pd.to_datetime(series.index, format='%Y-%m')
    series
    
    Out[25]: 
    2001-01-01    266.0
    2001-02-01    145.9
    2001-03-01    183.1
    2001-04-01    119.3
    2001-05-01    180.3
    2001-06-01    168.5
    2001-07-01    231.8
    2001-08-01    224.5
    2001-09-01    192.8
    2001-10-01    122.9
    2001-11-01    336.5
    2001-12-01    185.9
    2002-01-01    194.3
    2002-02-01    149.5
    2002-03-01    210.1
    2002-04-01    273.3
    2002-05-01    191.4
    2002-06-01    287.0
    2002-07-01    226.0
    2002-08-01    303.6
    2002-09-01    289.9
    2002-10-01    421.6
    2002-11-01    264.5
    2002-12-01    342.3
    2003-01-01    339.7
    2003-02-01    440.4
    2003-03-01    315.9
    2003-04-01    439.3
    2003-05-01    401.3
    2003-06-01    437.4
    2003-07-01    575.5
    2003-08-01    407.6
    2003-09-01    682.0
    2003-10-01    475.3
    2003-11-01    581.3
    2003-12-01    646.9
    

    那么你的代码就可以工作了:

    In[28]:
    X = series.rename("actual").to_frame() 
    X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill')
    X
    
    Out[28]: 
                actual
    2001-01-01   266.0
    2001-01-02   266.0
    2001-01-03   266.0
    2001-01-04   266.0
    2001-01-05   266.0
    2001-01-06   266.0
    2001-01-07   266.0
    2001-01-08   266.0
    2001-01-09   266.0
    2001-01-10   266.0
    2001-01-11   266.0
    2001-01-12   266.0
    2001-01-13   266.0
    2001-01-14   266.0
    2001-01-15   266.0
    2001-01-16   266.0
    2001-01-17   266.0
    2001-01-18   266.0
    2001-01-19   266.0
    2001-01-20   266.0
    2001-01-21   266.0
    2001-01-22   266.0
    2001-01-23   266.0
    2001-01-24   266.0
    2001-01-25   266.0
    2001-01-26   266.0
    2001-01-27   266.0
    2001-01-28   266.0
    2001-01-29   266.0
    2001-01-30   266.0
               ...
    2003-11-02   581.3
    2003-11-03   581.3
    2003-11-04   581.3
    2003-11-05   581.3
    2003-11-06   581.3
    2003-11-07   581.3
    2003-11-08   581.3
    2003-11-09   581.3
    2003-11-10   581.3
    2003-11-11   581.3
    2003-11-12   581.3
    2003-11-13   581.3
    2003-11-14   581.3
    2003-11-15   581.3
    2003-11-16   581.3
    2003-11-17   581.3
    2003-11-18   581.3
    2003-11-19   581.3
    2003-11-20   581.3
    2003-11-21   581.3
    2003-11-22   581.3
    2003-11-23   581.3
    2003-11-24   581.3
    2003-11-25   581.3
    2003-11-26   581.3
    2003-11-27   581.3
    2003-11-28   581.3
    2003-11-29   581.3
    2003-11-30   581.3
    2003-12-01   646.9
    
    [1065 rows x 1 columns]
    

    【讨论】:

    • 感谢 EdChum。如何保证索引单调递增或递减?因为当我传递给 X = series.rename("actual").to_frame() X = X.loc[~X.index.duplicated(keep='last')].asfreq('d', 'ffill')使用 X = 系列,我得到错误。
    • 你关心一年的开始吗?
    • 并非如此。如果你愿意,它可以在任何一年开始。
    • 这里的月份列实际上代表什么?只是一个月吗?如果是这样,我们可以将字符串 '20' 添加到月份列然后进行转换,这样可以吗
    • 是的,没关系。因此,在将字符串 '20' 附加到月份列之后,我们可以拥有一年到一个月的日期时间索引,对吧?
    猜你喜欢
    • 1970-01-01
    • 2012-12-29
    • 1970-01-01
    • 2020-01-19
    • 1970-01-01
    • 2012-03-14
    • 1970-01-01
    • 2011-05-07
    • 1970-01-01
    相关资源
    最近更新 更多