'datetime.date' 和 'str' 的实例之间不支持 '<'答案

【问题标题】：'<' not supported between instances of 'datetime.date' and 'str''datetime.date' 和 'str' 的实例之间不支持 '<'
【发布时间】：2018-09-08 07:38:32
【问题描述】：

我得到一个类型错误：

TypeError: 'datetime.date' 和 'str' 的实例之间不支持'

在运行以下代码时：

import requests
import re
import json
import pandas as pd

def retrieve_quotes_historical(stock_code):
    quotes = []
    url = 'https://finance.yahoo.com/quote/%s/history?p=%s' % (stock_code, stock_code)
    r = requests.get(url)
    m = re.findall('"HistoricalPriceStore":{"prices":(.*?), "isPending"', r.text)
    if m:
        quotes = json.loads(m[0])
        quotes = quotes[::-1]
    return  [item for item in quotes if not 'type' in item]

quotes = retrieve_quotes_historical('INTC')
df = pd.DataFrame(quotes)

s = pd.Series(pd.to_datetime(df.date, unit='s'))
df.date = s.dt.date
df = df.set_index('date')

这段运行很顺利，但是当我尝试运行这段代码时：

df['2017-07-07':'2017-07-10']

我得到了 TypeError。

我该如何解决这个问题？

【问题讨论】：

您能否为问题添加更多上下文？而不仅仅是代码

标签： pandas dataframe indexing

【解决方案1】：

问题是您想使用字符串'2017-07-07' 进行切片，而您的索引类型为datetime.date。您的切片也应该属于这种类型。

您可以通过如下定义开始日期和结束日期来做到这一点：

import pandas as pd

startdate = pd.to_datetime("2017-7-7").date()
enddate = pd.to_datetime("2017-7-10").date()
df.loc[startdate:enddate]

startdate 和 enddate 现在是 datetime.date 类型，您的切片可以工作：

            adjclose    close        high        low            open        volume
date
2017-07-07    33.205006    33.880001    34.119999    33.700001    33.700001    18304500
2017-07-10    32.979588    33.650002    33.740002    33.230000    33.250000    29918400

也可以在没有 pandas 的情况下创建 datetime.date 类型：

import datetime

startdate = datetime.datetime.strptime('2017-07-07', "%Y-%m-%d").date()
enddate = datetime.datetime.strptime('2017-07-10', "%Y-%m-%d").date()

【讨论】：

这是正确的、有用的和有帮助的。谢谢你。我不禁想知道为什么 Pandas 没有日期的简写，所以我们可以将它们写为字符串，并将它们理解为日期或日期时间，而无需太多额外的措辞。类似于 SAS 的 '23-01-05 5:00:00'dt，尽管您不能为此使用单引号。

【解决方案2】：

除了Paul's answer，还有几点需要注意：

pd.to_datetime(df['date'],unit='s') 已经返回了一个Series，所以你不需要包装它。
此外，解析成功时pd.to_datetime 返回的Series 具有dtype datetime64[ns] (timezone-naïve) 或 datetime64[ns, tz] (timezone-aware) .如果解析失败，它仍然可能返回一个没有错误的系列，dtype O for "object"（至少在 pandas 1.2.4 中），表示回退到 Python 的 stdlib datetime.datetime。
使用df['2017-07-07':'2017-07-10']中的字符串进行过滤仅在索引的dtype为datetime64[...]时有效，在为O时无效（对象

因此，通过所有这些，您的示例只需更改最后几行即可：

df = pd.DataFrame(quotes)
s = pd.to_datetime(df['date'],unit='s')   # no need to wrap in Series
assert str(s.dtype) == 'datetime64[ns]'   # VERY IMPORTANT!!!!
df.index = s
print(df['2020-08-01':'2020-08-10'])    # it now works!

它产生：

                           date       open  ...    volume   adjclose
date                                        ...
2020-08-03 13:30:00  1596461400  48.270000  ...  31767100  47.050617
2020-08-04 13:30:00  1596547800  48.599998  ...  29045800  47.859154
2020-08-05 13:30:00  1596634200  49.720001  ...  29438600  47.654583
2020-08-06 13:30:00  1596720600  48.790001  ...  23795500  47.634968
2020-08-07 13:30:00  1596807000  48.529999  ...  36765200  47.105358
2020-08-10 13:30:00  1597066200  48.200001  ...  37442600  48.272457

最后还要注意，如果您的日期时间格式以某种方式包含时间偏移，似乎有一个强制性的 utc=True 参数添加（在 Pandas 1.2.4 中）到 pd.to_datetime，否则返回的 dtype 将是 'O'即使解析成功。我希望这在未来会有所改善，因为它根本不直观。

有关详细信息，请参阅to_datetime 文档。

【讨论】：