【问题标题】:Numpy version of finding the highest and lowest value locations within an interval of another column?在另一列的间隔内查找最高和最低值位置的 Numpy 版本?
【发布时间】:2017-04-04 21:48:18
【问题描述】:

给定以下numpy 数组。如何使用numpy在第 1 列的区间内找到第 0 列的最高和最低值位置?

import numpy as np
data = np.array([
        [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
        [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
        [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
        [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
        [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
        [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
        [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
        [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
        [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
        [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
        [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
        [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
        [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
        [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
        [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
        [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
        [1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
    ])

在下面的第三列中,您可以看到每个区间的最大值和最小值。

+-------+----------+-----------+---------+
| index |  Value   | Intervals | Min/Max |
+-------+----------+-----------+---------+
|     0 | 1879.289 | np.nan    |         |
|     1 | 1879.281 | np.nan    |         |
|     2 | 1879.292 | 1         |         |
|     3 | 1879.295 | 1         |         |
|     4 | 1879.481 | 1         |         |
|     5 | 1879.294 | 1         |         |
|     6 | 1879.268 | 1         | -1      | min
|     7 | 1879.293 | 1         |         |
|     8 | 1879.277 | 1         |         |
|     9 | 1879.285 | 1         |         |
|    10 | 1879.464 | 1         |         |
|    11 | 1879.475 | 1         |         |
|    12 | 1879.971 | 1         |         |
|    13 | 1879.779 | 1         |         |
|    17 | 1879.986 | 1         |         |
|    18 | 1880.791 | 1         |  1      | max
|    19 |  1880.29 | 1         |         |
|    55 | 1879.253 | np.nan    |         |
|    56 | 1878.268 | np.nan    |         |
|    57 |  1875.73 | 1         | -1      |min
|    58 | 1876.792 | 1         |         |
|    59 | 1875.977 | 1         |         | 
|    60 | 1876.408 | 1         |         |
|    61 | 1877.159 | 1         |         |
|    62 | 1877.187 | 1         |         |
|    63 | 1883.164 | 1         |         |
|    64 | 1883.171 | 1         |         |
|    65 | 1883.495 | 1         |         |
|    66 | 1883.962 | 1         |         |
|    67 | 1885.158 | 1         |         |
|    68 | 1885.974 | 1         |  1      | max
|    69 | 1886.479 | np.nan    |         |
|    70 | 1885.969 | np.nan    |         |
|    71 | 1884.693 | 1         |         |
|    72 | 1884.977 | 1         |         |
|    73 | 1884.967 | 1         |         |
|    74 | 1884.691 | 1         | -1      | min
|    75 | 1886.171 | 1         |  1      | max
|    76 | 1886.166 | np.nan    |         |
|    77 | 1884.476 | np.nan    |         |
|    78 |  1884.66 | 1         |  1      | max
|    79 | 1882.962 | 1         |         |
|    80 | 1881.496 | 1         |         |
|    81 | 1871.163 | 1         | -1      | min
|    82 | 1874.985 | 1         |         |
|    83 | 1874.979 | 1         |         |
|    84 | 1871.173 | np.nan    |         |
|    85 | 1871.973 | np.nan    |         |
|    86 | 1871.682 | np.nan    |         |
|    87 | 1872.476 | np.nan    |         |
|    88 | 1882.361 | 1         |  1      | max
|    89 | 1880.869 | 1         |         |
|    90 | 1882.165 | 1         |         |
|    91 | 1881.857 | 1         |         |
|    92 | 1880.375 | 1         |         |
|    93 |  1880.66 | 1         |         |
|    94 | 1880.891 | 1         |         |
|    95 | 1880.377 | 1         |         |
|    96 | 1881.663 | 1         |         |
|    97 |  1881.66 | 1         |         |
|    98 | 1877.888 | 1         |         |
|    99 |  1875.69 | 1         |         |
|   100 | 1875.161 | 1         | -1      | min
|   101 | 1876.697 | np.nan    |         |
|   102 | 1876.671 | np.nan    |         |
|   103 | 1879.666 | np.nan    |         |
|   111 | 1877.182 | np.nan    |         |
|   112 | 1878.898 | 1         |         |
|   113 | 1878.668 | 1         |         |
|   114 | 1878.871 | 1         |         |
|   115 | 1878.882 | 1         |         |
|   116 | 1879.173 | 1         |  1      | max
|   117 | 1878.887 | 1         |         |
|   118 |  1878.68 | 1         |         |
|   119 | 1878.872 | 1         |         |
|   120 | 1878.677 | 1         |         |
|   121 | 1877.877 | 1         |         |
|   122 | 1877.669 | 1         |         |
|   123 |  1877.69 | 1         |         |
|   124 | 1877.684 | 1         |         |
|   125 |  1877.68 | 1         |         |
|   126 | 1877.885 | 1         |         |
|   127 | 1877.863 | 1         |         |
|   128 | 1877.674 | 1         |         |
|   129 | 1877.676 | 1         |         |
|   130 | 1877.687 | 1         |         |
|   131 | 1878.367 | 1         |         |
|   132 | 1878.179 | 1         |         |
|   133 | 1877.696 | 1         |         |
|   134 | 1877.665 | 1         | -1      | min
|   135 | 1877.667 | np.nan    |         |
|   136 | 1878.678 | np.nan    |         |
|   137 | 1878.661 | 1         |  1      | max
|   138 | 1878.171 | 1         |         |
|   139 | 1877.371 | 1         |         |
|   140 | 1877.359 | 1         |         |
|   141 | 1878.381 | 1         |         |
|   142 | 1875.185 | 1         | -1      | min
|   143 | 1875.367 | np.nan    |         |
|   144 | 1865.492 | np.nan    |         |
|   145 | 1865.495 | 1         |  -1     | min
|   146 | 1866.995 | 1         |         |
|   147 | 1866.672 | 1         |         |
|   148 | 1867.465 | 1         |         |
|   149 | 1867.663 | 1         |         |
|   150 | 1867.186 | 1         |         |
|   151 | 1867.687 | 1         |         |
|   152 | 1867.459 | 1         |         |
|   153 | 1867.168 | 1         |         |
|   154 | 1869.689 | 1         |         |
|   155 | 1869.693 | 1         |         |
|   156 | 1871.676 | 1         |         |
|   157 | 1873.174 | 1         | 1       | max
|   158 | 1873.691 | np.nan    |         |
|   159 | 1873.685 | np.nan    |         |
+-------+----------+-----------+---------+

我必须提前说明这个问题已经用here 回答了pandas 解决方案。对于大约 100 万行的表,该解决方案在大约 300 秒内执行合理。但是经过一些测试后,我发现如果表超过 300 万行,执行时间会急剧增加到超过 2500 秒甚至更多。对于这样一个简单的任务,这显然太长了。 numpy如何解决同样的问题?

【问题讨论】:

  • 第二个间隔的min 不应该是`1875.73`吗?
  • 同样,对于最后一个区间,最小值和最大值看起来错误地交换了。
  • 是的,你是对的
  • 如果您能做出这些更正,您会不会很好,以供将来参考。

标签: python performance numpy scipy


【解决方案1】:

这是一种 NumPy 方法 -

mask = ~np.isnan(data[:,1])

s0 = np.flatnonzero(mask[1:] > mask[:-1])+1
s1 = np.flatnonzero(mask[1:] < mask[:-1])+1
lens = s1 - s0

tags = np.repeat(np.arange(len(lens)), lens)
idx  = np.lexsort((data[mask,0], tags))

starts = np.r_[0,lens.cumsum()]

offsets = np.r_[s0[0], s0[1:] - s1[:-1]]
offsets_cumsum = offsets.cumsum()

min_ids = idx[starts[:-1]] + offsets_cumsum
max_ids = idx[starts[1:]-1] + offsets_cumsum

out = np.full(data.shape[0], np.nan)
out[min_ids] = -1
out[max_ids] = 1

【讨论】:

  • 再次感谢@Divakar :)
  • 只是一个简单的问题@Divakar,如果间隔内有最大值或最小值的重复值,那么此时脚本会选择间隔内的最后一个重复值。如何使它选择第一个遇到的最大值/最小值而不是最后一个?我现在注意到,在 pandas 中它是第一个被选中的,而在这里是最后一个
【解决方案2】:

所以这有点作弊,因为它使用了scipy

import numpy as np
from scipy import ndimage

markers = np.isnan(data[:, 1])
groups = np.cumsum(markers)

mins, max, min_idx, max_idx = ndimage.measurements.extrema(
    data[:, 0], labels=groups, index=range(2, groups.max(), 2))

【讨论】:

  • 谢谢@StephenRauch 的回答:)
猜你喜欢
  • 2017-08-26
  • 2013-03-22
  • 2017-09-29
  • 2021-04-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多