【问题标题】:Python/Pandas Binning Data TimedeltaPython/Pandas 分箱数据 Timedelta
【发布时间】:2018-04-06 10:01:32
【问题描述】:

我有一个包含两列的 DataFrame

    userID     duration
0   DSm7ysk    03:08:49
1   no51CdJ    00:35:50
2   ...

'duration' 类型为 timedelta。我试过使用

bins = [dt.timedelta(minutes = 0), dt.timedelta(minutes = 
        5),dt.timedelta(minutes = 10),dt.timedelta(minutes = 
        20),dt.timedelta(minutes = 30), dt.timedelta(hours = 4)]

labels = ['0-5min','5-10min','10-20min','20-30min','30min+']

df['bins'] = pd.cut(df['duration'], bins, labels = labels)

但是,分箱数据不使用指定的分箱,而是为帧中的每个持续时间创建的。

将 timedelta 对象分箱到不规则箱中的最简单方法是什么?还是我只是在这里遗漏了一些明显的东西?

【问题讨论】:

    标签: python pandas datetime timedelta binning


    【解决方案1】:

    您可以在分箱前标准化到秒。这减少了对整数进行分箱的问题。

    df = pd.DataFrame({'userID': ['A', 'B'],
                       'duration': pd.to_timedelta(['00:08:49', '00:35:50'])})
    
    L = ['00:00:00', '00:05:00', '00:10:00', '00:20:00', '00:30:00', '04:00:00']
    
    bins = pd.to_timedelta(L).total_seconds()
    cats = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']
    
    df['bins'] = pd.cut(df['duration'].dt.total_seconds(), bins, labels=cats)
    
    print(df)
    
    #    duration userID     bins
    # 0  00:08:49      A  5-10min
    # 1  00:35:50      B   30min+
    

    【讨论】:

      【解决方案2】:

      它适用于我的 pandas 0.23.4

      import pandas as pd
      import numpy as np
      
      df = pd.DataFrame({
          'userID': ['DSm7ysk', 'no51CdJ', 'foo', 'bar'],
          'duration': [pd.Timedelta('3 hours 8 minutes 49 seconds'), pd.Timedelta('35 minutes 50 seconds'), pd.Timedelta('1 minutes 13 seconds'), pd.Timedelta('6 minutes 43 seconds')]
      })
      
      bins = [
          pd.Timedelta(minutes = 0),
          pd.Timedelta(minutes = 5),
          pd.Timedelta(minutes = 10),
          pd.Timedelta(minutes = 20),
          pd.Timedelta(minutes = 30),
          pd.Timedelta(hours = 4)
      ]
      
      labels = ['0-5min', '5-10min', '10-20min', '20-30min', '30min+']
      
      df['bins'] = pd.cut(df['duration'], bins, labels = labels)
      

      结果:

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-05-06
        • 1970-01-01
        • 1970-01-01
        • 2017-10-14
        • 1970-01-01
        • 2020-07-03
        相关资源
        最近更新 更多