【问题标题】:create a week-day/weekend time series dataframe based on a daily time series dataframe根据每日时间序列数据框创建工作日/周末时间序列数据框
【发布时间】:2018-08-13 13:50:18
【问题描述】:

例如,我创建了一个包含时间序列信息的数据框

Time      daily-bill
2012-01-01   200
2012-01-02  300
2012-01-03   100
2012-01-04    500
….

我想根据上述时间序列创建另一个时间序列数据框。如何在 Pandas 中做到这一点?

Time(weekday-and-weekend)                       total-bill
Monday-Friday
Weekend
Monday-Friday
Weekend
Monday-Friday
Weekend

换句话说,时间步长将是weekdayweekend 的连续序列。 weekdayMonday to Friday 组成;而weekendSaturdaySunday 组成。 total-bill 列将存储相应天数发生的账单总和,这些信息来自现有的时间序列。

【问题讨论】:

    标签: python python-3.x pandas numpy


    【解决方案1】:

    用途:

    print (df)
            Time  daily-bill
    0 2012-01-01         200
    1 2012-01-02         300
    2 2012-01-03         100
    3 2012-01-04         500
    4 2012-01-05         200
    5 2012-01-06         300
    6 2012-01-07         100
    7 2012-01-08         500
    8 2012-01-09         500
    
    arr = np.where(df['Time'].dt.weekday > 4, 'Weekend','Monday-Friday')
    
    s = pd.Series(arr)
    s1 = s.ne(s.shift()).cumsum()
    
    df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                         .sum()
                         .reset_index(level=0, drop=True)
                         .reset_index())
    print (df)
                Time  daily-bill
    0        Weekend         200
    1  Monday-Friday        1400
    2        Weekend         600
    3  Monday-Friday         500
    

    解释

    1. 首先由weekdaynumpy.where 创建Series
    2. 然后创建另一个Series,它是由cumsum 创建的,由s 移位shift 来区分连续值
    3. 聚合sum 并通过reset_indexdrop=True 删除第一级

    详情

    print (s)
    0          Weekend
    1    Monday-Friday
    2    Monday-Friday
    3    Monday-Friday
    4    Monday-Friday
    5    Monday-Friday
    6          Weekend
    7          Weekend
    8    Monday-Friday
    dtype: object
    
    print (s1)
    0    1
    1    2
    2    2
    3    2
    4    2
    5    2
    6    3
    7    3
    8    4
    dtype: int32
    

    编辑:

    如果输入DataFrame 的第一列是DatetimeIndex

    print (df)
                daily-bill
    Time                  
    2012-01-01         200
    2012-01-02         300
    2012-01-03         100
    2012-01-04         500
    2012-01-05         200
    2012-01-06         300
    2012-01-07         100
    2012-01-08         500
    2012-01-09         500
    
    arr = np.where(df.index.weekday > 4, 'Weekend','Monday-Friday')
    
    s = pd.Series(arr, index=df.index)
    s1 = s.ne(s.shift()).cumsum()
    
    df = (df['daily-bill'].groupby([s1,s.rename('Time')])
                         .sum()
                         .reset_index(level=0, drop=True)
                         .reset_index())
    print (df)
                Time  daily-bill
    0        Weekend         200
    1  Monday-Friday        1400
    2        Weekend         600
    3  Monday-Friday         500
    

    【讨论】:

      猜你喜欢
      • 2023-01-12
      • 2019-11-02
      • 1970-01-01
      • 1970-01-01
      • 2021-03-18
      • 1970-01-01
      • 2021-08-06
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多