根据条件附加到列表但避免添加重复项答案

【问题标题】：Appending to a list based on criteria but avoiding adding duplicates根据条件附加到列表但避免添加重复项
【发布时间】：2021-10-16 04:13:10
【问题描述】：

我正在尝试根据每个球员的统计数据为 MLB 球队创建一个经过优化的阵容顺序。我有一个统计数据框，我从中提取并按重要性顺序添加到一个空列表中，然后重新排序列表以创建击球阵容。

以下是击球顺序：

最高 OBP
最高 OPS
第二高 SLG
第二高 OPS
最高SLG
第三高 OPS
第四高 OPS
第五高 OPS
第六高 OPS

现在 - 球员在阵容中的位置分配顺序的重要性对于优化也很重要。该顺序是 2,4,1,5,3,6,7,8,9。因此，阵容中的第二个位置是最重要的，因此应该首先将 OPS 最高的个人添加到列表中，这样就不会根据他的统计数据将他添加到阵容中的任何其他位置。

所以，我有一个空列表，我开始根据他们的统计数据拉出最重要的球员，然后从数据框中删除它们，这样我就可以再次拉出而无需两次选择它们。

这是我的代码：

opt_lineup = []
opt_lineup.append((chosen_team[chosen_team['OPS']==chosen_team['OPS'].max()]['Player']))
chosen_team.drop(chosen_team['OPS'].idxmax(), inplace = True)
opt_lineup.append((chosen_team[chosen_team['OPS']==chosen_team['OPS'].max()]['Player']))
chosen_team.drop(chosen_team['OPS'].idxmax(), inplace = True)
opt_lineup.append((chosen_team[chosen_team['OBP']==chosen_team['OBP'].max()]['Player']))
chosen_team.drop(chosen_team['OBP'].idxmax(), inplace = True)
opt_lineup.append((chosen_team[chosen_team['SLG']==chosen_team['SLG'].max()]['Player']))
chosen_team.drop(chosen_team['SLG'].idxmax(), inplace = True)
opt_lineup.append((chosen_team[chosen_team['SLG']==chosen_team['SLG'].max()]['Player']))
chosen_team.drop(chosen_team['SLG'].idxmax(), inplace = True)
chosen_team.head(30)

我遇到的问题是，当我将一个玩家添加到空列表中时，如果两个玩家并列获得最高的统计数据，那么当我只想拿第一个时，他们都会被添加到列表中。

本质上 - 我正在寻找类似于 keep = 'first' 的解决方案，当使用 .drop() 时，但在附加到列表时。

谢谢！

【问题讨论】：

标签： python pandas append

【解决方案1】：

所以您的问题是多个玩家拥有最高的统计数据，而您只想添加第一个。 list.max() 方法返回列表中的最大值，而 list.index() 方法查找列表中给定值的第一个索引。因此，让我们找到最大 ops 值（例如）并在我们将使用的索引列表中找到该值的第一个索引。

下面是代码示例：

ops_list = [400, 500, 500, 243]

max_ops = max(ops_list)#finds max value in the above list

max_index = ops_list.index(max_obs)#gets first index with the max value we found

battingorder[1] = players[max_index]#Assuming battingorder is defined and players has synced indices with the stat lists

编辑：确保在将索引添加到击球顺序后从每个列表中删除索引，否则您可能会得到重复玩家。

【讨论】：

【解决方案2】：

为确保仅选择最大条目中的一个条目（第一个），而不是选择与最大值匹配的所有条目，您可以修改代码以使用 idxmax()，类似于您为 .drop 所做的操作。

从idxmax()的官方文档可以看出：

如果多个值等于最大值，则带有该最大值的第一行标签返回值。

opt_lineup = []
opt_lineup.append(chosen_team.loc[chosen_team['OPS'].idxmax(), 'Player'])   #changed
chosen_team.drop(chosen_team['OPS'].idxmax(), inplace = True)
opt_lineup.append(chosen_team.loc[chosen_team['OPS'].idxmax(), 'Player'])   #changed
chosen_team.drop(chosen_team['OPS'].idxmax(), inplace = True)
opt_lineup.append(chosen_team.loc[chosen_team['OBP'].idxmax(), 'Player'])   #changed
chosen_team.drop(chosen_team['OBP'].idxmax(), inplace = True)
opt_lineup.append(chosen_team.loc[chosen_team['SLG'].idxmax(), 'Player'])   #changed
chosen_team.drop(chosen_team['SLG'].idxmax(), inplace = True)
opt_lineup.append(chosen_team.loc[chosen_team['SLG'].idxmax(), 'Player'])   #changed
chosen_team.drop(chosen_team['SLG'].idxmax(), inplace = True)
chosen_team.head(30)

请注意，我还更改了您的代码以使用 .loc 的格式，例如

chosen_team.loc[chosen_team['OPS'].idxmax(), 'Player']

而不是使用格式：

chosen_team[chosen_team['OPS'].idxmax()]['Player']

这是为了更好地避免 SettingWithCopyWarning 并且还可以获得更好的执行时间/内存利用率。

【讨论】：