【问题标题】:compatibility of datetime with cudf and pandas for filter datetime in python日期时间与 cudf 和 pandas 的兼容性,用于在 python 中过滤日期时间
【发布时间】:2021-12-28 19:17:26
【问题描述】:

我想测试cudf,但坚持第一个简单的按日期时间过滤的任务。代码适用于 pandas,但不适用于 cudf

import pandas as pd
#import cudf as pd 
import time 
import datetime 
import dateutil

if __name__ == "__main__":
    Zeit_start = datetime.datetime.now()
    AGdata_search = pd.read_csv("testdata.csv",parse_dates=['Datetime'],infer_datetime_format=True,cache_dates=False)
    AGdata_TEST = AGdata_search.loc[(AGdata_search['Datetime'] >= dateutil.parser.parse("2021-11-02 13:44:00+00:00"))] 
    AGdata_TEST.to_csv("output.csv", encoding='utf-8',index=False)

testdata.csv 看起来像

Datetime,Open,High,Low,Close,Adj Close,Volume 
2021-10-22 13:30:00+00:00,149.69,149.75,149.01,149.04,149.04,4032096.0 
2021-10-22 13:40:00+00:00,149.69,150.175,148.845,149.92,149.92,19671400.0
2021-10-22 13:50:00+00:00,149.975,150.18,149.5601,149.75,149.75,11911828.0 
...

使用 cudf 会抛出“KeyError: 'Datetime'”

Environment (Win11 with wsl2, Ubuntu and a Docker container)
conda version : 4.10.3
python version : 3.8.10.final.0
virtual packages : __cuda=11.5=0
               __linux=5.10.60.1=0
               __glibc=2.27=0
               __unix=0=0
               __archspec=1=x86_64
  user-agent : conda/4.10.3 requests/2.25.1 CPython/3.8.10 Linux/5.10.60.1-microsoft-standard-WSL2 ubuntu/18.04.6 glibc/2.27
stoic_snyder
rapidsai/rapidsai:21.12-cuda11.0-runtime-ubuntu18.04-py3.7
CUDA_VER=11.0 DASK_XGBOOST_VER=0.2* RAPIDS_VER=21.12

【问题讨论】:

    标签: python pandas datetime cudf


    【解决方案1】:

    RAPIDS cuDF 尚未隐式支持此格式的 UTC 偏移量。因此,根据我们的文档,您必须在继续之前明确声明日期时间格式代码:https://docs.rapids.ai/api/cudf/stable/api_docs/api/cudf.to_datetime.html?highlight=datetime

    遵循这些代码:https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

    import cudf as cudf 
    import time 
    import datetime 
    import dateutil
    
    if __name__ == "__main__":
        Zeit_start = datetime.datetime.now()
        AGdata_search = cudf.read_csv("testdata.csv")
        AGdata_search['Datetime'] = cudf.to_datetime(AGdata_search['Datetime'], format='%Y-%m-%d %H:%M:%S+%z') # this makes it work)
        AGdata_TEST = AGdata_search.loc[(AGdata_search['Datetime'] >= dateutil.parser.parse("2021-11-02 13:44:00+00:00"))]
        AGdata_TEST.to_csv("output.csv", encoding='utf-8',index=False)
    

    testdata.csv 在哪里

    Datetime,Open,High,Low,Close,Adj Close,Volume 
    2021-10-22 13:30:00+00:00,149.69,149.75,149.01,149.04,149.04,4032096.0 
    2021-10-22 13:40:00+00:00,149.69,150.175,148.845,149.92,149.92,19671400.0
    2021-11-22 13:50:00+00:00,149.975,150.18,149.5601,149.75,149.75,11911828.0 
    

    我调整了您的 testdata.csv,以便 output.csv 根据您的搜索查询返回其中包含一些数据

    我相信有一种更清洁的方式来完成您的操作,但我不知道您的整个工作流程,所以这应该可以帮助您完成操作!

    【讨论】:

    猜你喜欢
    • 2021-06-15
    • 2017-03-31
    • 1970-01-01
    • 2014-03-31
    • 2016-12-18
    • 1970-01-01
    • 1970-01-01
    • 2015-03-06
    • 2012-09-30
    相关资源
    最近更新 更多