【问题标题】:Creating a new column for long weekend with existing public holiday and weekend columns使用现有的公共假期和周末列为长周末创建新列
【发布时间】:2019-09-10 06:42:52
【问题描述】:

我有一个 pandas 数据框,其中包含 date、week_day、public_holiday 和weekend 列。

 weekday Date           Public_Holiday? Weekend?
   5     2015-01-10              no      yes
   0     2015-01-12              no       no
   1     2015-01-13              no       no
   2     2015-01-14              no       no
   3     2015-01-15              no       no
   4     2015-01-16              no       no
   5     2015-01-17              no      yes
   6     2015-01-18              no      yes
   0     2015-01-19              no       no
   1     2015-01-20              no       no
   2     2015-01-21              no       no
   3     2015-01-22              no       no
   4     2015-01-23              yes      no
   5     2015-01-24              no      yes
   6     2015-01-25              no      yes
   1     2015-01-27              no       no
   2     2015-01-28              no       no
   3     2015-01-29              no       no
   4     2015-01-30              no       no
   5     2015-01-31              no      yes
   0     2015-02-02              no       no
   1     2015-02-03              no       no
   2     2015-02-04              no       no
   3     2015-02-05              no       no
   4     2015-02-06              no       no
   5     2015-02-07              no      yes
   6     2015-02-08              no      yes
   0     2015-02-09              yes      no
   1     2015-02-10              no       no
   2     2015-02-11              no       no

我需要添加一个带有长周末标志的附加列。输出应如下所示。

    long_weekend  weekday   Date          Public_Holiday? Weekend?
            0        5     2015-01-10              no      yes
            0        0     2015-01-12              no       no
            0        1     2015-01-13              no       no
            0        2     2015-01-14              no       no
            0        3     2015-01-15              no       no
            0        4     2015-01-16              no       no
            0        5     2015-01-17              no      yes
            0        6     2015-01-18              no      yes
            0        0     2015-01-19              no       no
            0        1     2015-01-20              no       no
            0        2     2015-01-21              no       no
            0        3     2015-01-22              no       no
            1        4     2015-01-23              yes      no
            1        5     2015-01-24              no      yes
            1        6     2015-01-25              no      yes
            0        1     2015-01-27              no       no
            0        2     2015-01-28              no       no
            0        3     2015-01-29              no       no
            0        4     2015-01-30              no       no
            0        5     2015-01-31              no      yes
            0        0     2015-02-02              no       no
            0        1     2015-02-03              no       no
            0        2     2015-02-04              no       no
            0        3     2015-02-05              no       no
            0        4     2015-02-06              no       no
            1        5     2015-02-07              no      yes
            1        6     2015-02-08              no      yes
            1        0     2015-02-09              yes      no
            0        1     2015-02-10              no       no
            0        2     2015-02-11              no       no

常规周末不被视为长周末。仅当星期五或星期一以及在某些情况下星期四或星期二是假期时,整个系列才被视为长周末。

这是我在下面尝试过的

df['long_weekend'] = np.where((df['Public_Holiday?'] == 'yes') | (df['Weekend?'] == 'yes'), 1, 0)
df['weekday'] = df['Predicted_Date'].dt.dayofweek
df['long_weekend'] = np.where(((df['long_weekend'] == 1) & (df['weekday'] == 4)) | (df['long_weekend'] == 1) & (df['weekday'] == 0)), 'yes','no')

这给了我以下输出,甚至正常工作日为 1。

    long_weekend  weekday         Date   Public_Holiday? Weekend?
            1        5     2015-01-10              no      yes
            0        0     2015-01-12              no       no
            0        1     2015-01-13              no       no
            0        2     2015-01-14              no       no
            0        3     2015-01-15              no       no
            0        4     2015-01-16              no       no
            1        5     2015-01-17              no      yes
            1        6     2015-01-18              no      yes
            0        0     2015-01-19              no       no
            0        1     2015-01-20              no       no
            0        2     2015-01-21              no       no
            0        3     2015-01-22              no       no
            1        4     2015-01-23              yes      no
            1        5     2015-01-24              no      yes
            1        6     2015-01-25              no      yes
            0        1     2015-01-27              no       no
            0        2     2015-01-28              no       no
            0        3     2015-01-29              no       no
            0        4     2015-01-30              no       no
            1        5     2015-01-31              no      yes
            0        0     2015-02-02              no       no
            0        1     2015-02-03              no       no
            0        2     2015-02-04              no       no
            0        3     2015-02-05              no       no
            0        4     2015-02-06              no       no
            1        5     2015-02-07              no      yes
            1        6     2015-02-08              no      yes
            1        0     2015-02-09              yes      no
            0        1     2015-02-10              no       no
            0        2     2015-02-11              no       no

我怎样才能让它工作?任何帮助都会很棒。提前致谢。

【问题讨论】:

  • 您为 df['long_weekend'] 发布的输出格式为 - 1 0 1,但在您的代码中,您将 yes no 分配给 df['long_weekend'],您能确认一下吗?跨度>

标签: python python-3.x pandas


【解决方案1】:

按照@jezraer 的非常好的解决方案,我实现了一种自动发现长周末的方法。正如 OP 要求的那样,它还考虑了周二或周四假期的情况。

该方法在本项目中可用: https://github.com/kryptonite0/python-long-weekends

import holidays as holidays_api
from long_weekends.long_weekends import spot_holiday_bridges

start = '2021-01-01'
end = '2021-12-31'
holidays = holidays_api.CH(prov='TI', years=[2020, 2021, 2022])
bridges, long_weekends = spot_holiday_bridges(
    start=start, end=end, holidays=holidays)
bridges
[Timestamp('2021-05-14 00:00:00'),
 Timestamp('2021-06-04 00:00:00'),
 Timestamp('2021-06-28 00:00:00')]


long_weekends
[Timestamp('2021-01-01 00:00:00'),
 Timestamp('2021-01-02 00:00:00'),
 Timestamp('2021-01-03 00:00:00'),
 Timestamp('2021-03-19 00:00:00'),
 Timestamp('2021-03-20 00:00:00'),
 Timestamp('2021-03-21 00:00:00'),
 Timestamp('2021-04-02 00:00:00'),
 Timestamp('2021-04-03 00:00:00'),
 Timestamp('2021-04-04 00:00:00'),
 Timestamp('2021-04-05 00:00:00'),
 Timestamp('2021-05-13 00:00:00'),
 Timestamp('2021-05-14 00:00:00'),
 Timestamp('2021-05-15 00:00:00'),
 Timestamp('2021-05-16 00:00:00'),
 Timestamp('2021-05-22 00:00:00'),
 Timestamp('2021-05-23 00:00:00'),
 Timestamp('2021-05-24 00:00:00'),
 Timestamp('2021-06-03 00:00:00'),
 Timestamp('2021-06-04 00:00:00'),
 Timestamp('2021-06-05 00:00:00'),
 Timestamp('2021-06-06 00:00:00'),
 Timestamp('2021-06-26 00:00:00'),
 Timestamp('2021-06-27 00:00:00'),
 Timestamp('2021-06-28 00:00:00'),
 Timestamp('2021-06-29 00:00:00'),
 Timestamp('2021-10-30 00:00:00'),
 Timestamp('2021-10-31 00:00:00'),
 Timestamp('2021-11-01 00:00:00')]

【讨论】:

    【解决方案2】:

    想法是通过shiftcumsum 创建连续的组,并用mapvalue_counts 计算组的数量,并用2 过滤更多:

    long = (df['Public_Holiday?'] == 'yes') | (df['Weekend?'] == 'yes')
    s = long.ne(long.shift()).cumsum()
    df['long_weekend'] = np.where((s.map(s.value_counts()) > 2) & long, 1, 0)
    

    print (df)
        weekday Predicted_Date Public_Holiday? Weekend?  long_weekend
    0         5     2015-01-10              no      yes             0
    1         0     2015-01-12              no       no             0
    2         1     2015-01-13              no       no             0
    3         2     2015-01-14              no       no             0
    4         3     2015-01-15              no       no             0
    5         4     2015-01-16              no       no             0
    6         5     2015-01-17              no      yes             0
    7         6     2015-01-18              no      yes             0
    8         0     2015-01-19              no       no             0
    9         1     2015-01-20              no       no             0
    10        2     2015-01-21              no       no             0
    11        3     2015-01-22              no       no             0
    12        4     2015-01-23             yes       no             1
    13        5     2015-01-24              no      yes             1
    14        6     2015-01-25              no      yes             1
    15        1     2015-01-27              no       no             0
    16        2     2015-01-28              no       no             0
    17        3     2015-01-29              no       no             0
    18        4     2015-01-30              no       no             0
    19        5     2015-01-31              no      yes             0
    20        0     2015-02-02              no       no             0
    21        1     2015-02-03              no       no             0
    22        2     2015-02-04              no       no             0
    23        3     2015-02-05              no       no             0
    24        4     2015-02-06              no       no             0
    25        5     2015-02-07              no      yes             1
    26        6     2015-02-08              no      yes             1
    27        0     2015-02-09             yes       no             1
    28        1     2015-02-10              no       no             0
    29        2     2015-02-11              no       no             0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-29
      • 1970-01-01
      相关资源
      最近更新 更多