如何按多列过滤数据框并添加值答案

【问题标题】：How to filter a dataframe by multiple columns and add a value如何按多列过滤数据框并添加值
【发布时间】：2017-12-06 16:41:59
【问题描述】：

df = mdb.read_table(mdbfile, "table")
invoices = pd.read_csv(file, delimiter=';')

lst = df[(df['El4'] == el4)] #contains specific rows of df

for i, row in lst.iterrows():
    prop = row['propertyid']
    mouvement = (row['Mouvements']*-1)

    a = invoices[(invoices['propertyReference'] == prop) & (invoices.invoiceGrossAmount == mouvement)]
    invoiceid = a['invoiceId'].values

    mouvement = (mouvement*-1)

    if df[(df.propertyid == prop) & (df.Mouvements == mouvement)]:
        df['id'] = invoiceid

我收到以下错误：

The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我想在 propertyid 等于 prop 和 mouvements 等于 mouvement 的数据框行中填写特定值 (invoiceId)。

【问题讨论】：

if df[(df.propertyid == prop) & (df.Mouvements == mouvement)]: 行是否发生错误？
似乎您可以使用merge 来执行此操作。指定您在帖子中概述的合并标准。
@dubbdan 确实在那一行
你能发布lst和invoices的样本吗？我认为您需要合并数据框，而不是使用逻辑语句。
@dubbbdan 非常感谢您的回复！你实现它几乎是正确的。其实我不想合并我的表。只有当 df 中的“propertyid”与发票中的“propertyReference”匹配并且 df 中的“Mouvements”与“invoiceGrossAmount”匹配时，我才需要在表 2 中填写 invoiceid（表 1）。

标签： python pandas dataframe filter

【解决方案1】：

跟进我的评论。似乎您只是想加入（或者，用熊猫的术语来说，merge）。

让我们从您的源数据开始：

df = mdb.read_table(mdbfile, "table")
invoices = pd.read_csv(file, delimiter=';')

从这里，我们想尝试加入数据：

df = df.merge(invoices, how='left', left_on=['propertyid', 'Mouvements'], right_on=['propertyReference', 'invoiceGrossAmount'])

在此连接中，我假设df 中的'propertyid' 与invoices 中的'propertyReference' 匹配，并且df 中的'Mouvements' 与invoices 中的'invoiceGrossAmount' 匹配。您可以根据需要进行调整。

我们使用左连接是因为当我们在 invoices 中找不到匹配项而不是不包括这些行时（在这种情况下我们将使用 @ 987654335@ 代替）。

这种方式不需要使用for 循环。我记得在某处读到，如果您在 pandas 中使用循环，很有可能有更好的方法来使用内置的 pandas 方法。

【讨论】：

非常感谢您的回复！你实现它几乎是正确的。其实我不想合并我的表。只有当 df 中的“propertyid”与发票中的“propertyReference”匹配并且 df 中的“Mouvements”与“invoiceGrossAmount”匹配时，我才需要在表 2 中填写 invoiceid（表 1）。
@ChristianVandeKoppel，表 1 和表 2 是什么？在您的原始帖子中，您谈到了df 和invoices。您能否提供两个表格内容的示例，以显示您希望结果的样子？

【解决方案2】：

另一个想法是使用combine first 方法。

要使用此方法，您需要确保两个数据帧中的索引相等。比如：

# Its not clear is the sign on Movement needs to be changed to merge with invoices. If so, comment out the line below
df.loc[:,'Mouvement] = df.loc[:,'Mouvement]*-1
df = df.set_index('propertyid','Mouvement')
invoices = invoices.set_index('propertyReference', 'invoiceGrossAmount')
df = df.combine_first(invoices)

这与@RagingRoosevelt 建议的merge 方法非常相似。

【讨论】：