【问题标题】:How to calculate the difference between in hours two timestamps and exclude weekends如何计算两个时间戳之间的小时差并排除周末
【发布时间】:2022-01-01 22:23:55
【问题描述】:

我有一个这样的数据框:

     Folder1                   Folder2                 
0   2021-11-22 12:00:00      2021-11-24 10:00:00
1   2021-11-23 10:30:00      2021-11-25 18:30:00    
2   2021-11-12 10:30:00      2021-11-15 18:30:00    
3   2021-11-23 10:00:00            NaN         

         

使用此代码:

def strfdelta(td: pd.Timestamp):
    seconds = td.total_seconds()
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours:02}:{minutes:02}:{seconds:02}"
            
df["Folder1"] = pd.to_datetime(df["Folder1"])
df["Folder2"] = pd.to_datetime(df["Folder2"])

bm1 = df["Folder1"].notna() & df["Folder2"].notna()
bm2 = df["Folder1"].notna() & df["Folder2"].isna()

df["Time1"] = (df.loc[bm1, "Folder2"] - df.loc[bm1, "Folder1"]).apply(strfdelta)
df["Time2"] = (datetime.now() - df.loc[bm2, "Folder1"]).apply(strfdelta)

我有这个 df:

     Folder1                   Folder2                           Time1     Time2
0   2021-11-22 12:00:00      2021-11-24 10:00:00                46:00:00    NaN
1   2021-11-23 10:30:00      2021-11-25 18:30:00                56:00:00    NaN
2   2021-11-12 10:30:00      2021-11-15 18:30:00                80:00:00    NaN
3   2021-11-23 10:00:00            NaN                             NaN     03:00:00

基本上,这就是我想要的,但是,在计算 Folder1 和 Folder2 的时间戳之间的差异时,如何排除周末时间?我应该改变什么才能拥有这样的df:

     Folder1                   Folder2                           Time1     Time2
0   2021-11-22 12:00:00      2021-11-24 10:00:00                46:00:00    NaN
1   2021-11-23 10:30:00      2021-11-25 18:30:00                56:00:00    NaN
2   2021-11-12 10:30:00      2021-11-15 18:30:00                32:00:00    NaN
3   2021-11-23 10:00:00            NaN                            NaN     03:00:00

因此,在索引 2 的行中,13.11 和 14.11 是周末,因此,在时间 1 中,差异应该是 32 而不是 80

【问题讨论】:

    标签: python python-3.x pandas dataframe datetime


    【解决方案1】:

    我认为您可以像这样利用pandas.date_range 函数与pandas.tseries.offsets.CustomBusinessHour 结合使用:

    # import pandas and numpy
    import pandas as pd
    import numpy as np
    
    # construct dataframe
    df = pd.DataFrame()
    df["Folder1"] = pd.to_datetime(
        pd.Series(
            [
                "2021-11-22 12:00:00",
                "2021-11-23 10:30:00",
                "2021-11-12 10:30:00",
                "2021-11-23 10:00:00",
            ]
        )
    )
    df["Folder2"] = pd.to_datetime(
        pd.Series(
            [
                "2021-11-24 10:00:00", 
                "2021-11-25 18:30:00", 
                "2021-11-15 18:30:00", 
                np.NaN
            ]
        )
    )
    
    # define custom business hours
    cbh = pd.tseries.offsets.CustomBusinessHour(start="0:00", end="23:59")
    
    # actual calculation
    df["Time1"] = df[~(df["Folder1"].isnull() | df["Folder2"].isnull())].apply(
        lambda row: len(
            pd.date_range(
                start=row["Folder1"], 
                end=row["Folder2"], 
                freq=cbh)),
        axis=1,
    )
    
    df.head()
    

    这对我来说是:

    print(df.head())
                  Folder1             Folder2  Time1
    0 2021-11-22 12:00:00 2021-11-24 10:00:00   46.0
    1 2021-11-23 10:30:00 2021-11-25 18:30:00   56.0
    2 2021-11-12 10:30:00 2021-11-15 18:30:00   32.0
    3 2021-11-23 10:00:00                 NaT    NaN
    

    作为奖励,您还可以使用它更有效地进行 Time2 计算:

    df["Time2"] = df[df["Folder2"].isnull()].apply(
        lambda row: len(
            pd.date_range(
                start=row["Folder1"],
                end=datetime.datetime.now(),
                freq=cbh)),
        axis=1,
    )
    

    这对我来说产生了(欧洲中部时间 14:45):

    print(df.head())
                  Folder1             Folder2  Time1  Time2
    0 2021-11-22 12:00:00 2021-11-24 10:00:00   46.0    NaN
    1 2021-11-23 10:30:00 2021-11-25 18:30:00   56.0    NaN
    2 2021-11-12 10:30:00 2021-11-15 18:30:00   32.0    NaN
    3 2021-11-23 10:00:00                 NaT    NaN    5.0
    

    【讨论】:

    • 嗨@Jonathan,谢谢你的回答,它工作正常,但我需要 Time1 和 Time2 中的列也以分钟和秒格式显示,因为我需要计算 SLA,所以,而不是 46.0,我可以将它的格式设置为 46:00:00,因为我有 Time1 中的值是这样的情况:12:34:23。
    • 啊错过了。在这种情况下,您也许可以将它与这个结合起来:stackoverflow.com/a/40276658/2186184
    • 我建议您重新设计您的 strfdelta 函数以返回 datetime.timedelta 而不是使用datetime.timedelta(seconds=seconds_input)的字符串
    【解决方案2】:
    df = pd.DataFrame({'Folder1': ['2021-11-22 12:00:00', '2021-11-23 10:30:00', '2021-11-12 10:30:00', '2021-11-23 10:00:00'],
                       'Folder2': ['2021-11-24 10:00:00', '2021-11-25 18:30:00', '2021-11-15 18:30:00', None]})
    df[['Folder1','Folder2']] = df[['Folder1','Folder2']].astype('datetime64')
    
    def strfdelta(t1, t2):
        hd = pd.date_range(t1, t2, freq='W-SAT').append(pd.date_range(t1, t2, freq='W-SUN'))
        sec = (t2-t1).total_seconds() - len(hd)*24*3600
        return f"{int(sec//3600):02d}:{int((sec%3600)//60):02d}:{int(sec%60):02d}"
    
    now = pd.to_datetime('now')
    df['Time1'] = df.fillna(now).apply(lambda x: strfdelta(x['Folder1'], x['Folder2']), axis=1)
    print(df) 
    

    打印:

                  Folder1             Folder2     Time1
    0 2021-11-22 12:00:00 2021-11-24 10:00:00  46:00:00
    1 2021-11-23 10:30:00 2021-11-25 18:30:00  56:00:00
    2 2021-11-12 10:30:00 2021-11-15 18:30:00  32:00:00
    3 2021-11-23 10:00:00                 NaT  20:58:26
    

    【讨论】:

    • 对添加 Time2 列进行了一些修改,这就是我想要的,非常感谢!
    【解决方案3】:
    df['Folder1']=pd.to_datetime(df['Folder1'])
    df['Folder2']=pd.to_datetime(df['Folder2']).fillna(df['Folder1'])
    
    df['missing']=df.apply(lambda x: pd.date_range(start=x['Folder1'], end=x['Folder2'], freq='D'), axis=1)#Create column with missing date periods
    
    
    
    df=(df.assign(time=np.where((df['missing'].apply(lambda x: x.strftime('%w'))).map(set).astype(str).str.contains('0|6'),#Where missing periods have a Saturday or Sunday
                                
                                (df['Folder2']-df['Folder1']).astype('timedelta64[h]')-48,# When above condition is met, subtract two 48 hours from the two days columns difference
                                (df['Folder2']-df['Folder1']).astype('timedelta64[h]'))#When condition not met substract just the two date columns)
                 ).drop(columns=['missing']) )             
    print(df)
    
    
    
    Folder1             Folder2  time
    0 2021-11-22 12:00:00 2021-11-24 10:00:00  46.0
    1 2021-11-23 10:30:00 2021-11-25 18:30:00  56.0
    2 2021-11-12 10:30:00 2021-11-15 18:30:00  32.0
    3 2021-11-23 10:00:00 2021-11-23 10:00:00   0.0
    

    【讨论】:

      猜你喜欢
      • 2018-12-26
      • 1970-01-01
      • 2012-11-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-05-04
      • 2020-05-13
      相关资源
      最近更新 更多