比较 Pandas 中的两个 excel 文件并返回两列中具有相同值的行答案

【问题标题】：Compare two excel files in Pandas and return the rows which have the same values in TWO columns比较 Pandas 中的两个 excel 文件并返回两列中具有相同值的行
【发布时间】：2016-12-15 10:17:32
【问题描述】：

我有几个 excel 文件。这两个文件都有两个共同的列：Customer_Name 和 Customer_No。第一个 excel 文件有大约 800k 行，而第二个只有 460 行。我想得到一个数据框，它在两个文件中都有公共数据，即从第一个文件中获取行，其中包含 Customer_Name 和 Customer_No。在第二个文件中找到。我尝试使用 .isin，但到目前为止，我发现仅使用单个变量（列）的示例。提前致谢！

【问题讨论】：

标签： python excel python-2.7 pandas

【解决方案1】：

使用merge:

df = pd.merge(df1, df2, on=['Customer_Name','Customer_No'])

如果您有不同的列名，请使用left_on 和right_on：

df = pd.merge(df1, 
              df2, 
              left_on=['Customer_Name','Customer_No'], 
              right_on=['Customer_head','Customer_Id'])

【讨论】：

如果我的回答有帮助，别忘了accept。谢谢。
但是我得到了一个 KeyError 。我检查了列的数据类型并确保它们也相同。 :(
请检查列名 - df2.columns，也许文本前有一些空格 - 例如 ' Customer_head'。或者问题是你在 df1 left_on 列和 df2 right_on 列中有吗？
做到了。非常感谢！ :)

【解决方案2】：

IIUC 并且您不需要第二个文件中的额外列 - 它仅用于加入，您可以这样做：

common_cols = ['Customer_Name','Customer_No']

df = (pd.read_excel(filename1)
        .join(pd.read_excel(filename2, usecols=common_cols),
                            on=common_cols))

【讨论】：

【解决方案3】：

我认为直接的方式是这样的：

df_file1 = pd.read_csv(file1, index_col) # set Customer_No
df_file2 = pd.read_csv(file2, index_col) # set Customer_No
for index, row in df_file1.iterrows():
    if row.get_value('Customer_name) in df_file2['Customer_name'].values:

在这里，您可以简单地按整数进行计数，或者根据需要生成一些复杂的工作，例如将 [index, row] 添加到结果 df。

【讨论】：