【问题标题】:Python Pandas Dataframe datatime column separation functionPython Pandas Dataframe 日期时间列分离功能
【发布时间】:2018-08-11 11:46:58
【问题描述】:

是否有任何现有的库能够将日期时间列分成仅包含一个变量的列,例如年、月、日、小时、分钟等。

我这样做是为了对我打算尝试在(Kaggle 纽约出租车费)上使用机器学习的数据进行预处理。

这是数据集中日期时间列的样子:

我已经能够使用以下方法做到这一点:

df_raw["pickup_year"] = df_raw['pickup_datetime'].dt.year
df_raw["pickup_month"] = df_raw['pickup_datetime'].dt.month
df_raw["pickup_day"] = df_raw['pickup_datetime'].dt.day
df_raw["pickup_hour"] = df_raw['pickup_datetime'].dt.hour
df_raw["pickup_minute"] = df_raw['pickup_datetime'].dt.minute
df_raw["pickup_second"] = df_raw['pickup_datetime'].dt.second
df_raw["pickup_dayofyear"] = df_raw['pickup_datetime'].dt.dayofyear
df_raw["pickup_week"] = df_raw['pickup_datetime'].dt.week
df_raw["pickup_weekofyear"] = df_raw['pickup_datetime'].dt.weekofyear
df_raw["pickup_dayofweek"] = df_raw['pickup_datetime'].dt.dayofweek
df_raw["pickup_weekday"] = df_raw['pickup_datetime'].dt.weekday
df_raw["pickup_quarter"] = df_raw['pickup_datetime'].dt.quarter
df_raw.head()

但我想,这肯定是在某个图书馆之前做过的吗?

【问题讨论】:

    标签: python pandas datetime dataframe


    【解决方案1】:

    您可以按属性列表循环并通过getattr 创建新列:

    L = ['year', 'month', 'day', 'hour', 'minute', 'second', 'dayofyear',
         'week', 'weekofyear', 'dayofweek', 'weekday', 'quarter']
    
    for i in L:
        df[i] = getattr(df['Dates'].dt, i)
    #jpp data sample
    print (df)
                    Dates  year  month  day  hour  minute  second  dayofyear  \
    0 2017-12-11 01:00:00  2017     12   11     1       0       0        345   
    1 2017-12-12 01:00:01  2017     12   12     1       0       1        346   
    2 2019-05-12 15:15:00  2019      5   12    15      15       0        132   
    3 2019-06-22 03:25:14  2019      6   22     3      25      14        173   
    4 2020-05-11 04:40:02  2020      5   11     4      40       2        132   
    5 2020-11-30 01:00:00  2020     11   30     1       0       0        335   
    
       week  weekofyear  dayofweek  weekday  quarter  
    0    50          50          0        0        4  
    1    50          50          1        1        4  
    2    19          19          6        6        2  
    3    25          25          5        5        2  
    4    20          20          0        0        2  
    5    49          49          0        0        4  
    

    【讨论】:

      【解决方案2】:

      您列出的属性派生datetime 系列下的整数数组。因此,虽然可能有特定于 Pandas 的方法来提取多个属性,但这可能并不比使用列表或字典映射更有效。下面是使用pd.concat的解决方案。

      设置

      df = pd.DataFrame({'Dates': ['2017-12-11 01:00:00', '2017-12-12 01:00:01',
                                   '2019-05-12 15:15:00', '2019-06-22 03:25:14',
                                   '2020-05-11 04:40:02', '2020-11-30 01:00:00']})
      
      df['Dates'] = pd.to_datetime(df['Dates'])
      

      解决方案

      L = ['year', 'month', 'day', 'hour', 'minute', 'second', 'dayofyear',
           'week', 'weekofyear', 'dayofweek', 'weekday', 'quarter']
      
      df = df.join(pd.concat([getattr(df['Dates'].dt, i).rename(i) for i in L], axis=1))
      

      结果

      print(df)
      
                      Dates  year  month  day  hour  minute  second  dayofyear  \
      0 2017-12-11 01:00:00  2017     12   11     1       0       0        345   
      1 2017-12-12 01:00:01  2017     12   12     1       0       1        346   
      2 2019-05-12 15:15:00  2019      5   12    15      15       0        132   
      3 2019-06-22 03:25:14  2019      6   22     3      25      14        173   
      4 2020-05-11 04:40:02  2020      5   11     4      40       2        132   
      5 2020-11-30 01:00:00  2020     11   30     1       0       0        335   
      
         week  weekofyear  dayofweek  weekday  quarter  
      0    50          50          0        0        4  
      1    50          50          1        1        4  
      2    19          19          6        6        2  
      3    25          25          5        5        2  
      4    20          20          0        0        2  
      5    49          49          0        0        4  
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-09-16
        • 1970-01-01
        • 1970-01-01
        • 2017-04-15
        • 1970-01-01
        • 1970-01-01
        • 2022-12-05
        相关资源
        最近更新 更多