【问题标题】:pandas to_datetime converting 71 to 2071 instead of 1971pandas to_datetime 将 71 转换为 2071 而不是 1971
【发布时间】:2022-01-06 01:49:20
【问题描述】:

我有这个数据帧,它是一个时间序列数据帧:

           day month   time  year
index                             
0       03    25    03/25/93    93
1        6    18     6/18/85    85
2        7     8      7/8/71    71
3        9    27     9/27/75    75
4        2     6      2/6/96    96
5        7    06     7/06/79    79
6        5    18     5/18/78    78
7       10    24    10/24/89    89
8        3     7      3/7/86    86
9        4    10     4/10/71    71
10       5    11     5/11/85    85
11       4    09     4/09/75    75
12       8    01     8/01/98    98
13       1    26     1/26/72    72
14       5    24   5/24/1990  1990
15       1    25   1/25/2011  2011
16       4    12     4/12/82    82
17      10    13  10/13/1976  1976
18       4    24     4/24/98    98
19       5    21     5/21/77    77
20       7    21     7/21/98    98
21      10    21    10/21/79    79
22       3    03     3/03/90    90
23       2    11     2/11/76    76
24      07    25  07/25/1984  1984
25       4    13     4-13-82    82
26       9    22     9/22/89    89
27       9    02     9/02/76    76
28       9    12     9/12/71    71
29      10    24    10/24/86    86
...    ...   ...         ...   ...
470    NaN   NaN        1983  1983
471    NaN   NaN        1999  1999
472    NaN   NaN        2010  2010
473    NaN   NaN        1975  1975
474    NaN   NaN        1972  1972
475    NaN   NaN        2015  2015
476    NaN   NaN        1989  1989
477    NaN   NaN        1994  1994
478    NaN   NaN        1993  1993
479    NaN   NaN        1996  1996
480    NaN   NaN        2013  2013
481    NaN   NaN        1974  1974
482    NaN   NaN        1990  1990
483    NaN   NaN        1995  1995
484    NaN   NaN        2004  2004
485    NaN   NaN        1987  1987
486    NaN   NaN        1973  1973
487    NaN   NaN        1992  1992
488    NaN   NaN        1977  1977
489    NaN   NaN        1985  1985
490    NaN   NaN        2007  2007
491    NaN   NaN        2009  2009
492    NaN   NaN        1986  1986
493    NaN   NaN        1978  1978
494    NaN   NaN        2002  2002
495    NaN   NaN        1979  1979
496    NaN   NaN        2006  2006
497    NaN   NaN        2008  2008
498    NaN   NaN        2005  2005
499    NaN   NaN        1980  1980

当我用 to_datetime 转换它时,当年份是 71 而不是 1971 时,我得到 2071。我应该怎么做才能得到 1971?

df['Date'] = pd.to_datetime(df['time'])#get error with 1971

参考下面的数据框:

           day month        time  year       Date
index                                        
0       03    25    03/25/93    93 1993-03-25
1        6    18     6/18/85    85 1985-06-18
2        7     8      7/8/71    71 2071-07-08
3        9    27     9/27/75    75 1975-09-27
4        2     6      2/6/96    96 1996-02-06
5        7    06     7/06/79    79 1979-07-06
6        5    18     5/18/78    78 1978-05-18
7       10    24    10/24/89    89 1989-10-24
8        3     7      3/7/86    86 1986-03-07
9        4    10     4/10/71    71 2071-04-10
10       5    11     5/11/85    85 1985-05-11
11       4    09     4/09/75    75 1975-04-09
12       8    01     8/01/98    98 1998-08-01
13       1    26     1/26/72    72 1972-01-26
14       5    24   5/24/1990  1990 1990-05-24
15       1    25   1/25/2011  2011 2011-01-25
16       4    12     4/12/82    82 1982-04-12
17      10    13  10/13/1976  1976 1976-10-13
18       4    24     4/24/98    98 1998-04-24
19       5    21     5/21/77    77 1977-05-21
20       7    21     7/21/98    98 1998-07-21
21      10    21    10/21/79    79 1979-10-21
22       3    03     3/03/90    90 1990-03-03
23       2    11     2/11/76    76 1976-02-11
24      07    25  07/25/1984  1984 1984-07-25
25       4    13     4-13-82    82 1982-04-13
26       9    22     9/22/89    89 1989-09-22
27       9    02     9/02/76    76 1976-09-02
28       9    12     9/12/71    71 2071-09-12
29      10    24    10/24/86    86 1986-10-24
...    ...   ...         ...   ...        ...
470    NaN   NaN        1983  1983 1983-01-01
471    NaN   NaN        1999  1999 1999-01-01
472    NaN   NaN        2010  2010 2010-01-01
473    NaN   NaN        1975  1975 1975-01-01
474    NaN   NaN        1972  1972 1972-01-01
475    NaN   NaN        2015  2015 2015-01-01
476    NaN   NaN        1989  1989 1989-01-01
477    NaN   NaN        1994  1994 1994-01-01
478    NaN   NaN        1993  1993 1993-01-01
479    NaN   NaN        1996  1996 1996-01-01
480    NaN   NaN        2013  2013 2013-01-01
481    NaN   NaN        1974  1974 1974-01-01
482    NaN   NaN        1990  1990 1990-01-01
483    NaN   NaN        1995  1995 1995-01-01
484    NaN   NaN        2004  2004 2004-01-01
485    NaN   NaN        1987  1987 1987-01-01
486    NaN   NaN        1973  1973 1973-01-01
487    NaN   NaN        1992  1992 1992-01-01
488    NaN   NaN        1977  1977 1977-01-01
489    NaN   NaN        1985  1985 1985-01-01
490    NaN   NaN        2007  2007 2007-01-01
491    NaN   NaN        2009  2009 2009-01-01
492    NaN   NaN        1986  1986 1986-01-01
493    NaN   NaN        1978  1978 1978-01-01
494    NaN   NaN        2002  2002 2002-01-01
495    NaN   NaN        1979  1979 1979-01-01
496    NaN   NaN        2006  2006 2006-01-01
497    NaN   NaN        2008  2008 2008-01-01
498    NaN   NaN        2005  2005 2005-01-01
499    NaN   NaN        1980  1980 1980-01-01

如您所见,1971 年是 2071 年。我试图查看文档,但没有找到指定 1900 年代而不是 2000 年代的参数或选项

【问题讨论】:

  • 什么数据决定了集合中的世纪?
  • 在dataframe中,就是时间列的最后一位
  • 首先使用 4 位数的年份。如果不能,我认为自定义解析是可行的方法(解析前的字符串修改可能会这样做)。
  • 你的关键年是哪一年? 1900 if year > 22 else 2000?

标签: python pandas datetime


【解决方案1】:

年份列是非常模糊的,因为没有声明一个世纪 Python 的行为会这样解释日期。你可以阅读推理here

找到了部分解决方案here。您基本上可以将年份抵消 100(一个世纪)来解决此问题。这将是一个 janky 修复。您可能希望在获得第二个数据框后实现此功能。

import pandas as pd
import numpy as np

df['Date'] = np.where(df['Date'].dt.year > 2022, df['Date'] - pd.offsets.DateOffset(years=100), df['Date'])
# Anything after 2022 is changed to have 100 years subtracted because 2022 is the current year, change it as the years progress

【讨论】:

  • 这很好用,因为您在输入中混合了数据(有些年份是 4 位数,有些是 2 位数,没有明确的截止年份)。因为它似乎是一个旧数据集,所以您可能会找到更好的截止年份。如果您获得新数据(今年),上述答案可能会失败。
  • @GiacomoCatenazzi 为不等式做 current year + 1 可能是一个解决方法
  • 就我个人而言,我会使用 2000。我认为 1999 年之后没有人会使用两位数的年份(而且短数据集似乎证实了 199x 年末是用 4 位数字编写的)。但只有了解数据集(源和编译时间)的 OP 才能回答并找到理想的年份。
  • 2000 也意味着从 OP 在其数据中的 2008 年日期减去 100 年
猜你喜欢
  • 2021-11-24
  • 2020-02-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-06-22
  • 2019-06-18
  • 1970-01-01
相关资源
最近更新 更多