【发布时间】:2019-06-10 23:36:57
【问题描述】:
我正在使用 iterrows 循环遍历数据帧并将第 n 行与第 n+1 行进行比较。 算法如下:
if columns 0,1,2 of row_n != columns 0,1,2 of row_n+1
output row = row_n
then check row_n+1 vs row_n+2...
if columns 0,1,2 of row_n == columns 0,1,2 of row_n+1
output row columns 0,1,2,3 = row_n columns 0,1,2,3
output row column 4 = (row_n column 4 + row_n+1 column 4)
then "skip one row" and check row_n+2 vs row_n+3...
我当前的代码适用于第一次比较,但随后停止。我的猜测是,当我试图“跳过一行”时,问题正在发生。我试图使用 index = index+1 但输出看起来不像预期的那样。我该如何解决这个问题?
row_iterator = TSG_table_sorted.iterrows()
_, row_n1 = row_iterator.__next__()
for index, row_n0 in row_iterator:
Terminal_ID_n0 = row_n0['Terminal_ID'];
TSG_n0 = row_n0['TSG'];
Date_n0 = row_n0['Date'];
Vol_n0 = row_n0['Vol'];
Terminal_no_n0 = row_n0['Terminal_no'];
Terminal_ID_n1 = row_n1['Terminal_ID'];
TSG_n1 = row_n1['TSG'];
Date_n1 = row_n1['Date'];
Vol_n1 = row_n1['Vol'];
if (Terminal_ID_n0==Terminal_ID_n1 and TSG_n0==TSG_n1 and Date_n0==Date_n1):
new_vol=Vol_n0+Vol_n1;
output_table.loc[i]=[Terminal_ID_n0,TSG_n0,Date_n0,Terminal_no_n0,new_vol]
i=i+1;
else:
output_table.loc[i]=[Terminal_ID_n0,TSG_n0,Date_n0,Terminal_no_n0,Vol_n0]
i=i+1;
index=index+1;
input
Terminal_ID TSG Date Terminal_no Vol
508 t_tel_003 CashCheck 10/1/2018 003 61
9605 t_tel_003 CashCheck 10/1/2018 003 3
2309 t_tel_003 CommercialDeposit 10/1/2018 003 12
4439 t_tel_003 CommercialDeposit 10/1/2018 003 10
9513 t_tel_003 CommercialDeposit 10/1/2018 003 122
12282 t_tel_003 CommercialDeposit 10/1/2018 003 1
current output
Terminal_ID TSG Date Terminal_no Vol
0 t_tel_003 CashCheck 10/1/2018 003 64
1 t_tel_003 CommercialDeposit 10/1/2018 003 12
2 t_tel_003 CommercialDeposit 10/1/2018 003 10
3 t_tel_003 CommercialDeposit 10/1/2018 003 122
4 t_tel_003 CommercialDeposit 10/1/2018 003 1
expected output
Terminal_ID TSG Date Terminal_no Vol
0 t_tel_003 CashCheck 10/1/2018 003 64
1 t_tel_003 CommercialDeposit 10/1/2018 003 22
3 t_tel_003 CommercialDeposit 10/1/2018 003 123
【问题讨论】:
-
我认为你的方法有缺陷。
row_iterator正在被遍历,index和row_n0始终根据当前row_iterator对象的值设置。您还将row_n1设置在循环之外并且从不更新它。这使得比较是静态的而不是动态的。我以前没有使用过 pandas,但您应该重组代码以更新循环内的row_n1。另外,i是什么?我没有看到它被设置,只是被使用。