Pythonic 循环遍历 DataFrame 的方法答案

【问题标题】：Pythonic way to loop through DataFramePythonic 循环遍历 DataFrame 的方法
【发布时间】：2021-12-19 20:51:40
【问题描述】：

对于非递减的id，数据框df 中有借方和贷方交易。交易是随机的，但要确保帐户在任何时候都不是净借方，即在任何行累计借方金额。从贷项分录中扣除每个借项分录的最有效算法是什么，这样在从后续贷记分录中借记之前，具有最低“id”的贷记分录被用尽为零。

例如：

import pandas as pd
df = pd.DataFrame({'id':[1,1,2,3,4,4,5], 'type':['CREDIT','DEBIT','DEBIT','DEBIT','CREDIT','DEBIT','DEBIT' ], 'amount':[10.0,1.0,4.0,2.0,15.0,4.0,1.0]})
df
   id    type  amount
0   1  CREDIT    10.0
1   1   DEBIT     1.0
2   2   DEBIT     4.0
3   3   DEBIT     2.0
4   4  CREDIT    15.0
5   4   DEBIT     4.0
6   5   DEBIT     1.0

所需的输出是：

   id    type  amount
0   1  CREDIT     0.0  # This is 10.0 - 1.0 - 4.0 - 2.0 -(4.0-1.0)
4   4  CREDIT    13.0  # This is 15.0 - (4.0 - (4.0-1.0)) - 1.0

【问题讨论】：

如果数据框类似于：df = pd.DataFrame({'id':[1,1,2,3], 'type':['CREDIT','DEBIT','CREDIT','DEBIT'], 'amount':[10,7,10,7]})，期望的输出是什么
@foneyoscar 将是( id, type, amount) (1, CREDIT, 0.0) (2, CREDIT, 6.0)，如下面的示例代码所示。

标签： python-3.x algorithm dataframe

【解决方案1】：

这是我的非 Pythonic 解决方案。我不确定它是否是时间和空间复杂度方面最好的算法（尤其是当df 变大时）。

import numpy as np
df_deb = df[df['type'] == 'DEBIT']
df_cred = df[df['type'] == 'CREDIT']
if (not df_deb.empty):
    df_deb.loc[:, 'cum_debit'] = df_deb['amount'].cumsum()
    for idx, row in df_cred.iterrows():
        if np.nanmax(df_deb['cum_debit']) >= row['amount']:
            df_deb.loc[:, 'cum_debit'] = (df_deb['cum_debit'] - row['amount']).clip(lower=0.0)
            df_cred.loc[df_cred.index == idx, 'amount'] = 0.0
        else:
            df_cred.loc[df_cred.index == idx, 'amount'] = row['amount'] - np.nanmax(df_deb['cum_debit'])
            break
df_cred # Required output.

   id    type  amount
0   1  CREDIT     0.0
4   4  CREDIT    13.0

【讨论】：