【问题标题】:Enumerating dataframe based on a column基于列枚举数据框
【发布时间】:2019-05-12 16:52:42
【问题描述】:

我正在处理一个看起来像这样的时间序列数据帧,除了超过数千行。我想创建一个新列来枚举具有相同“符号”值的行块。即第 0 行将是 0,第 1 行到第 23 行将是 1,第 24 行到第 30 行将是 2 等等......(时间顺序很重要)什么是最 Pythonic 的方式来完成这个?提前谢谢你

    Date       sign
0   2011-01-27  1
1   2011-01-28  -1
2   2011-01-31  -1
3   2011-02-01  -1
4   2011-02-02  -1
5   2011-02-07  -1
6   2011-02-08  -1
7   2011-02-09  -1
8   2011-02-10  -1
9   2011-02-11  -1
10  2011-02-14  -1
11  2011-02-15  -1
12  2011-02-16  -1
13  2011-02-17  -1
14  2011-02-18  -1
15  2011-02-21  -1
16  2011-02-22  -1
17  2011-02-23  -1
18  2011-02-24  -1
19  2011-02-25  -1
20  2011-02-28  -1
21  2011-03-01  -1
22  2011-03-02  -1
23  2011-03-03  -1
24  2011-03-04  1
25  2011-03-07  1
26  2011-03-08  1
27  2011-03-09  1
28  2011-03-10  1
29  2011-03-11  1
30  2011-03-14  1
31  2011-03-15  -1
32  2011-03-16  -1
33  2011-03-17  -1
34  2011-03-18  -1
35  2011-03-21  -1
36  2011-03-22  -1
37  2011-03-23  -1
38  2011-03-24  -1
39  2011-03-25  -1
40  2011-03-28  -1
41  2011-03-29  1
42  2011-03-30  1

【问题讨论】:

    标签: python python-3.x pandas time-series enumerate


    【解决方案1】:

    可以得到符号变化处的cumsum,使用diff获取:

    df['new_column'] = (df.sign.diff()!=0).cumsum()-1
    
    >>> df
              Date  sign  new_column
    0   2011-01-27     1      0
    1   2011-01-28    -1      1
    2   2011-01-31    -1      1
    3   2011-02-01    -1      1
    4   2011-02-02    -1      1
    5   2011-02-07    -1      1
    6   2011-02-08    -1      1
    7   2011-02-09    -1      1
    8   2011-02-10    -1      1
    9   2011-02-11    -1      1
    10  2011-02-14    -1      1
    11  2011-02-15    -1      1
    12  2011-02-16    -1      1
    13  2011-02-17    -1      1
    14  2011-02-18    -1      1
    15  2011-02-21    -1      1
    16  2011-02-22    -1      1
    17  2011-02-23    -1      1
    18  2011-02-24    -1      1
    19  2011-02-25    -1      1
    20  2011-02-28    -1      1
    21  2011-03-01    -1      1
    22  2011-03-02    -1      1
    23  2011-03-03    -1      1
    24  2011-03-04     1      2
    25  2011-03-07     1      2
    26  2011-03-08     1      2
    27  2011-03-09     1      2
    28  2011-03-10     1      2
    29  2011-03-11     1      2
    30  2011-03-14     1      2
    31  2011-03-15    -1      3
    32  2011-03-16    -1      3
    33  2011-03-17    -1      3
    34  2011-03-18    -1      3
    35  2011-03-21    -1      3
    36  2011-03-22    -1      3
    37  2011-03-23    -1      3
    38  2011-03-24    -1      3
    39  2011-03-25    -1      3
    40  2011-03-28    -1      3
    41  2011-03-29     1      4
    42  2011-03-30     1      4
    

    【讨论】:

    • 谢谢@sacul。它似乎工作。我想知道非 1,-1 列的相同问题有什么作用。
    • 是的,应该没问题!每当符号更改值时,它总是会更新计数,无论它是从 -1 到 1 还是从 -9999 到 9999(或任何数值)
    【解决方案2】:

    你可以这样做:

    df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()
    
      Date  sign  count
    0   2011-01-27     1      1
    1   2011-01-28    -1      2
    2   2011-01-31    -1      2
    3   2011-02-01    -1      2
    4   2011-02-02    -1      2
    5   2011-02-07    -1      2
    .
    .
    .
    23  2011-03-03    -1      2
    24  2011-03-04     1      3
    25  2011-03-07     1      3
    26  2011-03-08     1      3
    27  2011-03-09     1      3
    

    【讨论】:

    • 谢谢@nixon。
    猜你喜欢
    • 1970-01-01
    • 2021-12-06
    • 2019-02-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-27
    • 2014-09-16
    相关资源
    最近更新 更多