【问题标题】:Determining the cumulative maximum of a column确定列的累积最大值
【发布时间】:2017-08-25 18:53:40
【问题描述】:

我正在尝试以下代码

df = pd.DataFrame([[23, 52], [36, 49], [52, 61], [75, 82], [97, 12]], columns=['A', 'B'])
df['C'] = np.where(df['A'] > df['C'].shift(), df['A'], df['C'].shift())
print(df)

假设第一个df['C].shift() 操作应该假设为0(因为df['C'] 不存在)

预期输出

    A   B   C
0  23  52  23
1  36  49  36
2  12  61  36
3  75  82  75
4  70  12  75

但我遇到了 KeyError 异常。

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Development\workspace\TestPython\TestPython.py", line 6, in <module>
    df['C'] = np.where(df['A'] > df['C'].shift(), df['B'].shift(), df['A'])
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'C'

据我了解,这种情况正在发生,因为 C 列第一次不存在,因此移动列会引发此异常。

我的问题是否有解决此问题的替代方法?

【问题讨论】:

  • Er.. 你创建了一个没有“C”列的 df,所以你得到一个KeyError 也就不足为奇了。什么是预期的输出?您不应该使用“B”列吗?
  • 你的意思是np.where(df['A'] &gt; df['B'].shift(), df['B'].shift(), df['A'])
  • 我需要这种方式。所以,df.loc[0: 'C'] 可以是0df['A'],然后从df.loc[1: 'C'] df['C'] = np.where(df['A'] &gt; df['C'].shift(), df['C'].shift(), df['A']) 接管
  • 对于上面的df,请给我们一些想要的输出。换班可能不是办法。
  • @cᴏʟᴅsᴘᴇᴇᴅ 我已经更新了原始帖子以显示预期的输出应该是什么。如果shift 不是这样,那么另一个建议将不胜感激。谢谢

标签: python pandas dataframe


【解决方案1】:

你需要cummax:

df['C'] = df.A.cummax()

【讨论】:

  • 不错!不知道这件事。 (+1)
猜你喜欢
  • 1970-01-01
  • 2022-06-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-12-19
  • 2020-07-24
  • 1970-01-01
相关资源
最近更新 更多