【问题标题】:Pandas, python: Count unique name-method occurrences each 5 minutesPandas,python:每 5 分钟计算一次唯一的名称方法出现次数
【发布时间】:2018-02-07 18:15:12
【问题描述】:

我收到了以下格式的 pandas 数据框:

             datetime                               name  mtd  code
0 2017-09-07 00:00:08                        profile/log  GET  300
1 2017-09-07 00:00:17                        profile/log  PUT  300
3 2017-09-07 00:00:19                             unknown PUT  200
4 2017-09-07 00:00:21                   extras/dashboard  GET  300
5 2017-09-07 00:00:23                       extras/stats  GET  300
6 2017-09-07 00:00:26                 extras/dashboard    GET  300
7 2017-09-07 00:00:29         extras/authz-profile/check  GET  200
8 2017-09-07 00:00:34                              about  PUT  300
9 2017-09-07 00:00:36                         extras/fav  GET  304
2 2017-09-07 00:00:44                       extras/store  GET  200

我想做的是:

  • 计算每个名称-mtd 对的出现次数其中响应​​代码以 3 开头 2017-09-07 00:00:102017-09-07 00:00:40 开始的每 5 秒间隔

理想的输出是:

     datetime_start     pair                      3??_count
2017-09-07 00:00:10     profile/log - GET         2
2017-09-07 00:00:15     -                         0
2017-09-07 00:00:20     extras/dashboard - GET    1
2017-09-07 00:00:20     extras/stats - GET        1
2017-09-07 00:00:25     extras/dashboard - GET    1
2017-09-07 00:00:30     about - PUT               1
2017-09-07 00:00:35     extras/fav - GET          1
2017-09-07 00:00:40     -                         0   

我如何使用 pandas 做到这一点?

我已经编写了一段代码来创建时间段,如desirable output 表中所示,但不知道如何计算 3?每 5 秒时间段的名称-mtd 对。我将非常感谢任何帮助!

data['datetime_start'] = pd.date_range(start="2017-09-07 00:00:10", end="2017-09-07 00:00:40", freq="5S")

【问题讨论】:

  • 棘手的一个!听起来像一个考试问题或什么的。您能否分享您已经尝试过的 groupby 代码,然后有人可以从中构建?

标签: python pandas pandas-groupby


【解决方案1】:

创建 start_date 列

df['start_date']= df[' datetime'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,dt.minute ,5*(dt.second//5)))

然后你可以聚合

df.groupby(['start_date','name','mtd']).size()

【讨论】:

    【解决方案2】:

    这是解决此问题的一种方法

    创建一个组合 name-mtd 的列,如下所示

    df['pair'] = df['name']+' - '+df['mtd']
    

    然后使用 PeriodIndex 指定将列数据时间分组的时间段,如下所示

    res = df.groupby([pd.PeriodIndex(df.datetime.dt.round('5s'),freq='5S'),
                    'pair'])['pair'].count()
    

    输出将是

    datetime             pair                            
    2017-09-07 00:00:10  profile/log - GET                   1
    2017-09-07 00:00:15  profile/log - PUT                   1
    2017-09-07 00:00:20  extras/dashboard - GET              1
                         unknown - PUT                       1
    2017-09-07 00:00:25  extras/dashboard - GET              1
                         extras/stats - GET                  1
    2017-09-07 00:00:30  extras/authz-profile/check - GET    1
    2017-09-07 00:00:35  about - PUT                         1
                         extras/fav - GET                    1
    2017-09-07 00:00:45  extras/store - GET                  1
    Name: pair, dtype: int64
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-05-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-10-25
      • 1970-01-01
      • 2013-10-19
      相关资源
      最近更新 更多