【问题标题】:Find the highest and lowest value locations within an interval on a column?查找列上某个区间内的最高和最低值位置?
【发布时间】:2017-08-26 08:12:41
【问题描述】:

鉴于此 pandas 数据框有两列,“值”和“间隔”。如何获得第三列“MinMax”,指示该值是该间隔内的最大值还是最小值?对我来说挑战是间隔长度和间隔之间的距离不是固定的,因此我发布了这个问题。

import pandas as pd
import numpy as np


data = pd.DataFrame([
        [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1],
        [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1],
        [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1],
        [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1],
        [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1],
        [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1],
        [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan],
        [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1],
        [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1],
        [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1],
        [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1],
        [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1],
        [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan],
        [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1],
        [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1],
        [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1],
        [1873.174,1],[1873.691,np.nan],[1873.685,np.nan]
    ])

在下面的第三列中,您可以看到每个区间的最大值和最小值。

+-------+----------+-----------+---------+
| index |  Value   | Intervals | Min/Max |
+-------+----------+-----------+---------+
|     0 | 1879.289 | np.nan    |         |
|     1 | 1879.281 | np.nan    |         |
|     2 | 1879.292 | 1         |         |
|     3 | 1879.295 | 1         |         |
|     4 | 1879.481 | 1         |         |
|     5 | 1879.294 | 1         |         |
|     6 | 1879.268 | 1         | min     |
|     7 | 1879.293 | 1         |         |
|     8 | 1879.277 | 1         |         |
|     9 | 1879.285 | 1         |         |
|    10 | 1879.464 | 1         |         |
|    11 | 1879.475 | 1         |         |
|    12 | 1879.971 | 1         |         |
|    13 | 1879.779 | 1         |         |
|    17 | 1879.986 | 1         |         |
|    18 | 1880.791 | 1         | max     |
|    19 |  1880.29 | 1         |         |
|    55 | 1879.253 | np.nan    |         |
|    56 | 1878.268 | np.nan    |         |
|    57 |  1875.73 | 1         |         |
|    58 | 1876.792 | 1         |         |
|    59 | 1875.977 | 1         | min     |
|    60 | 1876.408 | 1         |         |
|    61 | 1877.159 | 1         |         |
|    62 | 1877.187 | 1         |         |
|    63 | 1883.164 | 1         |         |
|    64 | 1883.171 | 1         |         |
|    65 | 1883.495 | 1         |         |
|    66 | 1883.962 | 1         |         |
|    67 | 1885.158 | 1         |         |
|    68 | 1885.974 | 1         | max     |
|    69 | 1886.479 | np.nan    |         |
|    70 | 1885.969 | np.nan    |         |
|    71 | 1884.693 | 1         |         |
|    72 | 1884.977 | 1         |         |
|    73 | 1884.967 | 1         |         |
|    74 | 1884.691 | 1         | min     |
|    75 | 1886.171 | 1         | max     |
|    76 | 1886.166 | np.nan    |         |
|    77 | 1884.476 | np.nan    |         |
|    78 |  1884.66 | 1         | max     |
|    79 | 1882.962 | 1         |         |
|    80 | 1881.496 | 1         |         |
|    81 | 1871.163 | 1         | min     |
|    82 | 1874.985 | 1         |         |
|    83 | 1874.979 | 1         |         |
|    84 | 1871.173 | np.nan    |         |
|    85 | 1871.973 | np.nan    |         |
|    86 | 1871.682 | np.nan    |         |
|    87 | 1872.476 | np.nan    |         |
|    88 | 1882.361 | 1         | max     |
|    89 | 1880.869 | 1         |         |
|    90 | 1882.165 | 1         |         |
|    91 | 1881.857 | 1         |         |
|    92 | 1880.375 | 1         |         |
|    93 |  1880.66 | 1         |         |
|    94 | 1880.891 | 1         |         |
|    95 | 1880.377 | 1         |         |
|    96 | 1881.663 | 1         |         |
|    97 |  1881.66 | 1         |         |
|    98 | 1877.888 | 1         |         |
|    99 |  1875.69 | 1         |         |
|   100 | 1875.161 | 1         | min     |
|   101 | 1876.697 | np.nan    |         |
|   102 | 1876.671 | np.nan    |         |
|   103 | 1879.666 | np.nan    |         |
|   111 | 1877.182 | np.nan    |         |
|   112 | 1878.898 | 1         |         |
|   113 | 1878.668 | 1         |         |
|   114 | 1878.871 | 1         |         |
|   115 | 1878.882 | 1         |         |
|   116 | 1879.173 | 1         | max     |
|   117 | 1878.887 | 1         |         |
|   118 |  1878.68 | 1         |         |
|   119 | 1878.872 | 1         |         |
|   120 | 1878.677 | 1         |         |
|   121 | 1877.877 | 1         |         |
|   122 | 1877.669 | 1         |         |
|   123 |  1877.69 | 1         |         |
|   124 | 1877.684 | 1         |         |
|   125 |  1877.68 | 1         |         |
|   126 | 1877.885 | 1         |         |
|   127 | 1877.863 | 1         |         |
|   128 | 1877.674 | 1         |         |
|   129 | 1877.676 | 1         |         |
|   130 | 1877.687 | 1         |         |
|   131 | 1878.367 | 1         |         |
|   132 | 1878.179 | 1         |         |
|   133 | 1877.696 | 1         |         |
|   134 | 1877.665 | 1         | min     |
|   135 | 1877.667 | np.nan    |         |
|   136 | 1878.678 | np.nan    |         |
|   137 | 1878.661 | 1         | max     |
|   138 | 1878.171 | 1         |         |
|   139 | 1877.371 | 1         |         |
|   140 | 1877.359 | 1         |         |
|   141 | 1878.381 | 1         |         |
|   142 | 1875.185 | 1         | min     |
|   143 | 1875.367 | np.nan    |         |
|   144 | 1865.492 | np.nan    |         |
|   145 | 1865.495 | 1         | max     |
|   146 | 1866.995 | 1         |         |
|   147 | 1866.672 | 1         |         |
|   148 | 1867.465 | 1         |         |
|   149 | 1867.663 | 1         |         |
|   150 | 1867.186 | 1         |         |
|   151 | 1867.687 | 1         |         |
|   152 | 1867.459 | 1         |         |
|   153 | 1867.168 | 1         |         |
|   154 | 1869.689 | 1         |         |
|   155 | 1869.693 | 1         |         |
|   156 | 1871.676 | 1         |         |
|   157 | 1873.174 | 1         | min     |
|   158 | 1873.691 | np.nan    |         |
|   159 | 1873.685 | np.nan    |         |
+-------+----------+-----------+---------+

【问题讨论】:

  • 请使您的数据可重复,随机种子数字将是最好的。您的 data = ... 行太长(1024 个字符?)它在我的外壳中炸毁了复制和粘贴。
  • 我说复制和粘贴那行会炸毁我的外壳;行太长:1024 个字符或更多。那条线使我的外壳崩溃。这就是为什么我建议您使用随机种子数字。
  • 对不起,是的,我在发表评论后才知道:)
  • 问题是我无法用随机种子重现这些数据,因为间隔列可以是任意大小和任意长度,所以我不知道如何制作。这就是我粘贴整个数据农场的原因
  • 如果有帮助的话,我可以放一个较短的版本

标签: pandas max grouping min series


【解决方案1】:
isnull = data.iloc[:, 1].isnull()
minmax = data.groupby(isnull.cumsum()[~isnull])[0].agg(['idxmax', 'idxmin'])
data.loc[minmax['idxmax'], 'MinMax'] = 'max'
data.loc[minmax['idxmin'], 'MinMax'] = 'min'
data.MinMax = data.MinMax.fillna('')
print(data)

            0    1 MinMax
0    1879.289  NaN       
1    1879.281  NaN       
2    1879.292  1.0       
3    1879.295  1.0       
4    1879.481  1.0       
5    1879.294  1.0       
6    1879.268  1.0    min
7    1879.293  1.0       
8    1879.277  1.0       
9    1879.285  1.0       
10   1879.464  1.0       
11   1879.475  1.0       
12   1879.971  1.0       
13   1879.779  1.0       
14   1879.986  1.0       
15   1880.791  1.0    max
16   1880.290  1.0       
17   1879.253  NaN       
18   1878.268  NaN       
19   1875.730  1.0    min
20   1876.792  1.0       
21   1875.977  1.0       
22   1876.408  1.0       
23   1877.159  1.0       
24   1877.187  1.0       
25   1883.164  1.0       
26   1883.171  1.0       
27   1883.495  1.0       
28   1883.962  1.0       
29   1885.158  1.0       
..        ...  ...    ...
85   1877.687  1.0       
86   1878.367  1.0       
87   1878.179  1.0       
88   1877.696  1.0       
89   1877.665  1.0    min
90   1877.667  NaN       
91   1878.678  NaN       
92   1878.661  1.0    max
93   1878.171  1.0       
94   1877.371  1.0       
95   1877.359  1.0       
96   1878.381  1.0       
97   1875.185  1.0    min
98   1875.367  NaN       
99   1865.492  NaN       
100  1865.495  1.0    min
101  1866.995  1.0       
102  1866.672  1.0       
103  1867.465  1.0       
104  1867.663  1.0       
105  1867.186  1.0       
106  1867.687  1.0       
107  1867.459  1.0       
108  1867.168  1.0       
109  1869.689  1.0       
110  1869.693  1.0       
111  1871.676  1.0       
112  1873.174  1.0    max
113  1873.691  NaN       
114  1873.685  NaN       

[115 rows x 3 columns]

【讨论】:

    【解决方案2】:
    data.columns=['Value','Interval']
    
    data['Ingroup'] = (data['Interval'].notnull() + 0)
    
    Use data['Interval'].notnull() to separate the groups...
    Use cumsum() to number them with `groupno`...
    Use groupby(groupno)..
    
    Finally you want something using apply/idxmax/idxmin to label the max/min
    
    But of course a for-loop as you suggested is the non-Pythonic but possibly simpler hack.
    

    【讨论】:

    • 所以我想要的是“间隔”列中的每个间隔(可以在 1 到 3 或 -1 到 3 之间)从单独的列中的“值”列中获取最大值和最小值。这是否有助于更好地理解?
    • 我现在更改了数据并将所有值替换为1。这更简单。你能看一下吗。也许它激发了一些获得最小值和最大值的想法
    • 感谢您的建议和指导 :) 非常感谢。如果可以的话,我会接受这两个答案都是正确的;)
    猜你喜欢
    • 1970-01-01
    • 2013-03-22
    • 2017-09-29
    • 1970-01-01
    • 1970-01-01
    • 2021-04-18
    • 1970-01-01
    • 2012-06-13
    • 1970-01-01
    相关资源
    最近更新 更多