【问题标题】:how to count continuous numbers of events in a long time series data如何计算长时间序列数据中的连续事件数
【发布时间】:2015-07-20 11:52:36
【问题描述】:

我有一个数据框,其中包含以下列:年、月、日和 prec 作为标题。如何计算每个月“prec”列中值为 0 的最长天数。

datasub = data[data['prec'] ==0.0]

datasub.groupby(['year','month'])['prec'].count()

从这段代码中我没有得到我期望的值结果

数据如下:

Out[70]: 
      year  month  day  prec
0     1981      1    1   1.5
1     1981      1    2   0.0
2     1981      1    3   0.0
3     1981      1    4   0.4
4     1981      1    5   0.0
5     1981      1    6   1.0
6     1981      1    7   1.9
7     1981      1    8   0.6
8     1981      1    9   3.7
9     1981      1   10   0.0
10    1981      1   11   0.0
11    1981      1   12   0.0
12    1981      1   13   0.0
13    1981      1   14  12.2
14    1981      1   15   1.7
15    1981      1   16   0.6
16    1981      1   17   0.9
17    1981      1   18   0.6
18    1981      1   19   0.4
19    1981      1   20   0.2
20    1981      1   21   1.4
21    1981      1   22   3.2
22    1981      1   23   0.0
23    1981      1   24   0.2
24    1981      1   25   1.2
25    1981      1   26   0.0
26    1981      1   27   0.0
27    1981      1   28   0.0
28    1981      1   29   0.0
29    1981      1   30   0.2
...    ...    ...  ...   ...
3987  1991     12    2   0.0
3988  1991     12    3   0.0
3989  1991     12    4   0.0
3990  1991     12    5   0.5
3991  1991     12    6   0.4
3992  1991     12    7   1.2
3993  1991     12    8   0.0
3994  1991     12    9   0.0
3995  1991     12   10   0.0
3996  1991     12   11   0.0
3997  1991     12   12   0.0

【问题讨论】:

    标签: python pandas dataframe


    【解决方案1】:
    import pandas as pd
    import numpy as np
    
    # simulate some artificial data
    # ============================================
    np.random.seed(0)
    df = pd.DataFrame(np.random.randn(4000), columns=['prec'], index=pd.date_range('1981-01-01', periods=4000, freq='D'))
    df['prec'] = np.where(df['prec'] > 0, df['prec'], 0.0)
    df['year'] = df.index.year
    df['month'] = df.index.month
    df['day'] = df.index.day
    df
    
                  prec  year  month  day
    1981-01-01  1.7641  1981      1    1
    1981-01-02  0.4002  1981      1    2
    1981-01-03  0.9787  1981      1    3
    1981-01-04  2.2409  1981      1    4
    1981-01-05  1.8676  1981      1    5
    1981-01-06  0.0000  1981      1    6
    1981-01-07  0.9501  1981      1    7
    1981-01-08  0.0000  1981      1    8
    1981-01-09  0.0000  1981      1    9
    1981-01-10  0.4106  1981      1   10
    1981-01-11  0.1440  1981      1   11
    1981-01-12  1.4543  1981      1   12
    1981-01-13  0.7610  1981      1   13
    1981-01-14  0.1217  1981      1   14
    1981-01-15  0.4439  1981      1   15
    ...            ...   ...    ...  ...
    1991-11-30  0.9764  1991     11   30
    1991-12-01  0.1772  1991     12    1
    1991-12-02  0.0000  1991     12    2
    1991-12-03  0.1067  1991     12    3
    1991-12-04  0.0000  1991     12    4
    1991-12-05  0.0000  1991     12    5
    1991-12-06  0.5765  1991     12    6
    1991-12-07  0.0653  1991     12    7
    1991-12-08  0.0000  1991     12    8
    1991-12-09  0.3949  1991     12    9
    1991-12-10  0.0000  1991     12   10
    1991-12-11  1.7796  1991     12   11
    1991-12-12  0.0000  1991     12   12
    1991-12-13  1.5771  1991     12   13
    1991-12-14  0.0000  1991     12   14
    
    [4000 rows x 4 columns]
    
    # processing
    # ===========================================
    def func(group):
        return (group.prec != 0).astype(int).cumsum().value_counts().values[0] - 1
    
    df.groupby(['year', 'month']).apply(func)
    
    year  month
    1981  1        2
          2        5
          3        4
          4        2
          5        3
          6        4
          7        3
          8        5
          9        5
          10       2
          11       6
          12       6
    1982  1        5
          2        3
          3        4
                  ..
    1990  10       9
          11       4
          12       5
    1991  1        6
          2        4
          3        4
          4        4
          5        4
          6        9
          7        3
          8        5
          9        6
          10       6
          11       3
          12       2
    dtype: int64
    

    这里的想法是对非零值使用脉冲,然后创建一个阶跃函数。

    # take a look at a sample group
    # ===========================================
    group = df.groupby(['year', 'month']).get_group((1981,1))
    group
    # create a step function
    group['step_func'] = (group.prec != 0).astype(int).cumsum()
    
                  prec  year  month  day  step_func
    1981-01-01  1.7641  1981      1    1          1
    1981-01-02  0.4002  1981      1    2          2
    1981-01-03  0.9787  1981      1    3          3
    1981-01-04  2.2409  1981      1    4          4
    1981-01-05  1.8676  1981      1    5          5
    1981-01-06  0.0000  1981      1    6          5
    1981-01-07  0.9501  1981      1    7          6
    1981-01-08  0.0000  1981      1    8          6
    1981-01-09  0.0000  1981      1    9          6
    1981-01-10  0.4106  1981      1   10          7
    1981-01-11  0.1440  1981      1   11          8
    1981-01-12  1.4543  1981      1   12          9
    1981-01-13  0.7610  1981      1   13         10
    1981-01-14  0.1217  1981      1   14         11
    1981-01-15  0.4439  1981      1   15         12
    1981-01-16  0.3337  1981      1   16         13
    1981-01-17  1.4941  1981      1   17         14
    1981-01-18  0.0000  1981      1   18         14
    1981-01-19  0.3131  1981      1   19         15
    1981-01-20  0.0000  1981      1   20         15
    1981-01-21  0.0000  1981      1   21         15
    1981-01-22  0.6536  1981      1   22         16
    1981-01-23  0.8644  1981      1   23         17
    1981-01-24  0.0000  1981      1   24         17
    1981-01-25  2.2698  1981      1   25         18
    1981-01-26  0.0000  1981      1   26         18
    1981-01-27  0.0458  1981      1   27         19
    1981-01-28  0.0000  1981      1   28         19
    1981-01-29  1.5328  1981      1   29         20
    1981-01-30  1.4694  1981      1   30         21
    1981-01-31  0.1549  1981      1   31         22
    
    # value_counts, pick the max value and subtract 1
    group['step_func'].value_counts().values[0] - 1
    
    2
    

    更新:

    使用.values[0] 会导致整数索引混淆。将其替换为.iloc[0]

    # processing
    # ===========================================
    def func(group):
        return (group.prec != 0).astype(int).cumsum()[group.prec == 0].value_counts().iloc[0]
    

    【讨论】:

    • 非常感谢李建勋,现在可以了 :)
    • 以及如何找出具有此最大值的年份月份,即最大数量为零
    • 你能给我一个关于如何计算非零值的想法吗我试过 def func(group): return (group.prec
    猜你喜欢
    • 2013-11-27
    • 2020-07-02
    • 1970-01-01
    • 2021-05-15
    • 2021-05-26
    • 2018-08-27
    • 2023-03-08
    • 2018-07-22
    • 1970-01-01
    相关资源
    最近更新 更多