【问题标题】:Python find difference between dates in one pandas column?Python在一个熊猫列中找到日期之间的差异?
【发布时间】:2017-03-21 07:19:07
【问题描述】:

我正在尝试提出一种计算会话持续时间的方法。我的示例数据如下。我假设如果有人再次登录 - 他们开始一个新会话,因此前一个会话应该已经结束。因此,我将在用户再次登录之前通过操作使用登录作为会话持续时间。

Action,Duration,_time,User
getForeignBugs,3,2016-11-07 15:45:18.992,savaithi
getServiceRequests,5,2016-11-07 15:45:18.902,savaithi
login,8088,2016-11-07 15:45:18.804,savaithi
getAuditTrail,550,2016-11-07 15:45:10.627,savaithi
getEnclosures,447,2016-11-07 15:45:09.994,savaithi
login,4810,2016-11-07 15:45:09.040,savaithi
getNoteTemplates,2,2016-11-07 15:45:04.220,savaithi
getQuickSearchInitInfo2,3,2016-11-07 15:45:01.995,savaithi
getQuickSearchInitInfo,3,2016-11-07 15:45:01.873,savaithi
login,0,2016-11-07 15:45:00.979,savaithi
getUserPreferences,2,2016-11-07 15:45:00.958,savaithi
getUserPreferences,2,2016-11-07 15:45:00.956,savaithi
SecurityServiceImpl.constructFromSession,2,2016-11-07 15:45:00.954,savaithi
setBooleanPreference,2,2016-11-07 15:45:00.954,savaithi
login,0,2016-11-07 15:45:00.658,savaithi
getPreference,1,2016-11-07 15:45:00.582,savaithi
getUserPreferences,129,2016-11-07 15:44:52.376,savaithi
login,2,2016-11-07 15:44:52.246,savaithi

如何在 login 和 login[index-1] 之间动态访问数据?

对于下面的例子,我想使用getPreference,1,2016-11-07 15:45:00.582 - login,2,2016-11-07 15:44:52.246

login,0,2016-11-07 15:45:00.658,savaithi
getPreference,1,2016-11-07 15:45:00.582,savaithi
getUserPreferences,129,2016-11-07 15:44:52.376,savaithi
login,2,2016-11-07 15:44:52.246,savaithi

【问题讨论】:

    标签: python datetime pandas session-cookies


    【解决方案1】:

    IIUC 你可以这样做:

    首先让我们对 DF 进行排序:

    In [71]: x = df.sort_values(['User','_time']).reset_index()
    
    In [72]: x
    Out[72]:
        index                                    Action  Duration                   _time      User
    0      17                                     login         2 2016-11-07 15:44:52.246  savaithi
    1      16                        getUserPreferences       129 2016-11-07 15:44:52.376  savaithi
    2      15                             getPreference         1 2016-11-07 15:45:00.582  savaithi
    3      14                                     login         0 2016-11-07 15:45:00.658  savaithi
    4      12  SecurityServiceImpl.constructFromSession         2 2016-11-07 15:45:00.954  savaithi
    5      13                      setBooleanPreference         2 2016-11-07 15:45:00.954  savaithi
    6      11                        getUserPreferences         2 2016-11-07 15:45:00.956  savaithi
    7      10                        getUserPreferences         2 2016-11-07 15:45:00.958  savaithi
    8       9                                     login         0 2016-11-07 15:45:00.979  savaithi
    9       8                    getQuickSearchInitInfo         3 2016-11-07 15:45:01.873  savaithi
    10      7                   getQuickSearchInitInfo2         3 2016-11-07 15:45:01.995  savaithi
    11      6                          getNoteTemplates         2 2016-11-07 15:45:04.220  savaithi
    12      5                                     login      4810 2016-11-07 15:45:09.040  savaithi
    13      4                             getEnclosures       447 2016-11-07 15:45:09.994  savaithi
    14      3                             getAuditTrail       550 2016-11-07 15:45:10.627  savaithi
    15      2                                     login      8088 2016-11-07 15:45:18.804  savaithi
    16      1                        getServiceRequests         5 2016-11-07 15:45:18.902  savaithi
    17      0                            getForeignBugs         3 2016-11-07 15:45:18.992  savaithi
    

    现在让我们只归档Action == 'login'next.Action == 'login' 的行,加上最后一行

    In [34]: x.loc[(x.Action == 'login') | (x.Action.shift(-1) == 'login') | (x.index == x.index[-1])]
    Out[34]:
        index              Action  Duration                   _time      User
    0      17               login         2 2016-11-07 15:44:52.246  savaithi
    2      15       getPreference         1 2016-11-07 15:45:00.582  savaithi
    3      14               login         0 2016-11-07 15:45:00.658  savaithi
    7      10  getUserPreferences         2 2016-11-07 15:45:00.958  savaithi
    8       9               login         0 2016-11-07 15:45:00.979  savaithi
    11      6    getNoteTemplates         2 2016-11-07 15:45:04.220  savaithi
    12      5               login      4810 2016-11-07 15:45:09.040  savaithi
    14      3       getAuditTrail       550 2016-11-07 15:45:10.627  savaithi
    15      2               login      8088 2016-11-07 15:45:18.804  savaithi
    17      0      getForeignBugs         3 2016-11-07 15:45:18.992  savaithi
    
    In [35]: x.loc[(x.Action == 'login') | (x.Action.shift(-1) == 'login') | (x.index == x.index[-1]), '_time'].diff()
    Out[35]:
    0                NaT
    2    00:00:08.336000
    3    00:00:00.076000
    7    00:00:00.300000
    8    00:00:00.021000
    11   00:00:03.241000
    12   00:00:04.820000
    14   00:00:01.587000
    15   00:00:08.177000
    17   00:00:00.188000
    Name: _time, dtype: timedelta64[ns]
    

    【讨论】:

    • 是的!这看起来正是我想要做的。而且也简单得多。谢谢麦克斯!
    • 现在我进入了它,答案有点偏离 - 在将 df3.loc[(df3.Action == 'login') | (df3.Action.shift(-1) == 'login') | (df3.index == df3.index[-1]), '_time'].diff() 分配给一个新的列名之后必须向上移动一个。我使用:df3.session_duration = df3.session_duration.shift(-1) 之后将会话持续时间与登录而不是操作相结合。谢谢您的帮助!看起来我需要对loc 进行一些阅读 - 在此之前从未真正使用过它。
    猜你喜欢
    • 2017-09-07
    • 2019-01-03
    • 1970-01-01
    • 2016-11-06
    • 2016-11-13
    • 2020-05-07
    • 2013-03-24
    • 2019-01-18
    • 1970-01-01
    相关资源
    最近更新 更多