熊猫过滤和比较日期答案

【问题标题】：pandas filtering and comparing dates熊猫过滤和比较日期
【发布时间】：2019-10-27 22:24:54
【问题描述】：

我有一个 sql 文件，其中包含我读入 pandas 的数据。

df = pandas.read_sql('Database count details', con=engine,
                     index_col='id', parse_dates='newest_available_date')

输出

id       code   newest_date_available
9793708  3514   2015-12-24
9792282  2399   2015-12-25
9797602  7452   2015-12-25
9804367  9736   2016-01-20
9804438  9870   2016-01-20

下一行代码是获取上周的日期

date_before = datetime.date.today() - datetime.timedelta(days=7) # Which is 2016-01-20

我要做的是，将date_before 与df 进行比较，并打印出所有小于date_before 的行

if (df['newest_available_date'] < date_before): print(#all rows)

显然这会给我一个错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我应该怎么做？

【问题讨论】：

还要确保您的列是日期时间类型。使用 df.dtypes

标签： python pandas

【解决方案1】：

我会做这样的面具：

a = df[df['newest_date_available'] < date_before]

如果date_before = datetime.date(2016, 1, 19)，则返回：

        id  code newest_date_available
0  9793708  3514            2015-12-24
1  9792282  2399            2015-12-25
2  9797602  7452            2015-12-25

【讨论】：

我仍然收到错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我复制了您的数据框并使用df = pd.read_clipboard(parse_dates=['newest_date_available']) 将其读回，然后该过程运行良好。尝试以这种方式进行，如果您仍有问题，请告诉我。请记住，parse_dates 参数需要一个列表，因此在您的pd.read_sql(parse_dates=['newest_available_data']) 中。
我用print(a) 复制了a = df[df['newest_date_available'] < date_before]，它成功了！但是当我试图将其放入 if 语句时出现错误，因为我想做类似 if a is true: merge id with code 之类的事情。
考虑发布一个新问题，您原来的问题只是关于根据某些规则过滤和打印日期。谢谢。
为什么 < 和 <= 工作相同？如果我想排除提到的日期怎么办？

【解决方案2】：

使用datetime.date(2019, 1, 10) 有效，因为pandas 将日期强制转换为引擎盖下的日期时间。但是，在 pandas 的未来版本中将不再是这种情况。

从 0.24 及更高版本开始，它现在发出警告：

FutureWarning：将日期时间系列与“datetime.date”进行比较。目前，“datetime.date”被强制转换为日期时间。在将来 pandas 不会强制，并且会引发 TypeError。

更好的解决方案是its official documentation 提出的解决方案，作为 Pandas 替换 python datetime.datetime 对象。

为了提供一个引用 OP 初始数据集的示例，您可以这样使用它：

import pandas
cond1 = df.newest_date_available < pd.Timestamp(2016,1,10)
df.loc[cond1, ]

【讨论】：