【问题标题】:Dividing data frame into multiple (consecutive) time series将数据帧划分为多个(连续)时间序列
【发布时间】:2017-11-15 01:34:05
【问题描述】:

我正在进行一项实验,我在打开和关闭阀门时进行测量。我有限位开关指示完全打开和完全壁橱。我只对关闭或打开时的数据感兴趣。 我的 pandas 数据集如下所示(简化):

Time                       Flow_A    Flow_B      Open closed            
2017-06-12 09:46:31.068    0.000933  295.933070  1    0
2017-06-12 09:46:31.660    0.287122  292.727820  1    0
2017-06-12 09:46:32.252    0.256170  288.869600  0    0
2017-06-12 09:46:32.844    0.052523  284.265850  0    0
2017-06-12 09:46:33.437    0.367495  278.394200  0    1
2017-06-12 09:46:34.029    1.956472  270.846450  0    1
2017-06-12 09:46:34.621    5.265860  260.768250  0    0
2017-06-12 09:46:35.214   12.328835  248.132450  0    0
2017-06-12 09:46:35.807   22.592590  232.688620  1    0
2017-06-12 09:46:36.400   35.768205  214.997420  1    0
2017-06-12 09:46:36.992   51.623265  195.298150  1    0
2017-06-12 09:46:37.584   70.855590  174.048000  1    0

我已经弄清楚如何使用 python 获取感兴趣的区域:

mask = (data['Open']==0 & (data['Port_2'] == 0)
data.loc[mask]

这会给我:

Time                       Flow_A    Flow_B      Open closed
2017-06-12 09:46:32.252    0.256170  288.869600  0    0
2017-06-12 09:46:32.844    0.052523  284.265850  0    0
2017-06-12 09:46:34.621    5.265860  260.768250  0    0
2017-06-12 09:46:35.214   12.328835  248.132450  0    0

问题是如何将其拆分/划分/分组/子集为两个连续的数据集。时间段未知,日志条目之间的间隔不完全相同。我希望应该在掩码中找到连续的数据,但我不知道该怎么做。

【问题讨论】:

  • 我不确定是否理解连续时间序列 - 您是否需要拆分所有由掩码过滤的连续行,例如像我的回答这样的新专栏?还是别的什么?

标签: python pandas


【解决方案1】:

我认为你需要:

mask = (data['Open']==0) & (data['closed'] == 0)
data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
print (data)
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:31.068   0.000933  295.93307     1       0     NaN
2017-06-12  09:46:31.660   0.287122  292.72782     1       0     NaN
2017-06-12  09:46:32.252   0.256170  288.86960     0       0     1.0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0     1.0
2017-06-12  09:46:33.437   0.367495  278.39420     0       1     NaN
2017-06-12  09:46:34.029   1.956472  270.84645     0       1     NaN
2017-06-12  09:46:34.621   5.265860  260.76825     0       0     2.0
2017-06-12  09:46:35.214  12.328835  248.13245     0       0     2.0
2017-06-12  09:46:35.807  22.592590  232.68862     1       0     NaN
2017-06-12  09:46:36.400  35.768205  214.99742     1       0     NaN
2017-06-12  09:46:36.992  51.623265  195.29815     1       0     NaN
2017-06-12  09:46:37.584  70.855590  174.04800     1       0     NaN

print (data[mask])
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:32.252   0.256170  288.86960     0       0     1.0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0     1.0
2017-06-12  09:46:34.621   5.265860  260.76825     0       0     2.0
2017-06-12  09:46:35.214  12.328835  248.13245     0       0     2.0

如果需要int 也可以来自0

data.loc[mask, 'groups'] = mask.ne(mask.shift())[mask].cumsum()
data['groups'] = data['groups'].fillna(0).astype(int) - 1
print (data)
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:31.068   0.000933  295.93307     1       0      -1
2017-06-12  09:46:31.660   0.287122  292.72782     1       0      -1
2017-06-12  09:46:32.252   0.256170  288.86960     0       0       0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0       0
2017-06-12  09:46:33.437   0.367495  278.39420     0       1      -1
2017-06-12  09:46:34.029   1.956472  270.84645     0       1      -1
2017-06-12  09:46:34.621   5.265860  260.76825     0       0       1
2017-06-12  09:46:35.214  12.328835  248.13245     0       0       1
2017-06-12  09:46:35.807  22.592590  232.68862     1       0      -1
2017-06-12  09:46:36.400  35.768205  214.99742     1       0      -1
2017-06-12  09:46:36.992  51.623265  195.29815     1       0      -1
2017-06-12  09:46:37.584  70.855590  174.04800     1       0      -1

print (data[mask])
                    Time     Flow_A     Flow_B  Open  closed  groups
2017-06-12  09:46:32.252   0.256170  288.86960     0       0       0
2017-06-12  09:46:32.844   0.052523  284.26585     0       0       0
2017-06-12  09:46:34.621   5.265860  260.76825     0       0       1
2017-06-12  09:46:35.214  12.328835  248.13245     0       0       1

【讨论】:

    猜你喜欢
    • 2020-08-19
    • 2013-04-20
    • 2019-03-08
    • 1970-01-01
    • 2021-05-13
    • 1970-01-01
    • 2017-08-26
    • 2022-12-31
    • 2013-02-08
    相关资源
    最近更新 更多