将 CSV 匹配行与 Python 进行比较答案

【问题标题】：Comparing CSV matching rows with Python将 CSV 匹配行与 Python 进行比较
【发布时间】：2015-11-06 20:36:18
【问题描述】：

我有两个 CSV，每个 CSV 只包含一列：

littleListIPs.csv:

10.187.172.140
10.187.172.141
10.187.172.142
10.187.172.143
10.187.172.144
10.187.172.145
10.187.172.154
10.187.172.155

(...)

BigListIPs.csv:

10.187.172.146
10.187.172.147
10.187.172.148
10.187.172.149
10.187.172.150
10.187.172.151
10.187.172.152
10.187.172.153
10.187.172.154
10.187.172.155

(...)

我需要一个脚本来比较它们并创建第三个文件 (output.csv)，其中包含 littleListIPs.csv 中的每一行，以及一个确认该 IP 是否存在于 BigListIPs.csv 文件中的列，就像在以下输出（您可以放置“;”而不是“|”）：

10.187.172.140 | Not present in BigListIPs.csv
10.187.172.141 | Not present in BigListIPs.csv
10.187.172.142 | Not present in BigListIPs.csv
10.187.172.143 | Not present in BigListIPs.csv
10.187.172.144 | Not present in BigListIPs.csv
10.187.172.145 | Not present in BigListIPs.csv
10.187.172.154 | Present in BigListIPs.csv
10.187.172.155 | Present in BigListIPs.csv

我在 Stack (Python: Comparing two CSV files and searching for similar items) 中看到了一个类似的案例，但我无法很好地满足我的需要，即使是一个更简单的案例。感谢您的帮助。

【问题讨论】：

请发布您尝试过的内容，我们可以从那里为您提供帮助

标签： python linux csv

【解决方案1】：

用 python 2.x 编写，因为那是我得心应手的。

将 BigIp 列表加载到集合中。检查in 一个数组是O(n)，检查in 一个集合是O(1)。
使用with 打开文件，这是一种很好的做法，可确保正确关闭文件。

代码：

#!/usr/bin/env python

import csv

little_ip_filename = "littleListIPs.csv"
big_ip_filename = "BigListIPs.csv"
output_filename = "results.csv"

# Load all the entries from BigListIPs into a set for quick lookup.
big_ips = set()

with open(big_ip_filename, 'r') as f:
    big_ip = csv.reader(f)
    for csv_row in big_ip:
        big_ips.add(csv_row[0])

# print big_ips

with open(little_ip_filename, 'r') as input_file, open(output_filename, 'w') as output_file:
    input_csv = csv.reader(input_file)
    output_csv = csv.writer(output_file)
    for csv_row in input_csv:
        ip = csv_row[0]
        status = "Present" if ip in big_ips else "Not Present"
        output_csv.writerow([ip, status + " in BigListIPs.csv"])

littleListIPs.csv：

10.187.172.140
10.187.172.141
10.187.172.142
10.187.172.143
10.187.172.144
10.187.172.145
10.187.172.154
10.187.172.155

BigListIPs.csv：

10.187.172.146
10.187.172.147
10.187.172.148
10.187.172.149
10.187.172.150
10.187.172.151
10.187.172.152
10.187.172.153
10.187.172.154
10.187.172.155

结果.csv：

10.187.172.140,Not Present in BigListIPs.csv
10.187.172.141,Not Present in BigListIPs.csv
10.187.172.142,Not Present in BigListIPs.csv
10.187.172.143,Not Present in BigListIPs.csv
10.187.172.144,Not Present in BigListIPs.csv
10.187.172.145,Not Present in BigListIPs.csv
10.187.172.154,Present in BigListIPs.csv
10.187.172.155,Present in BigListIPs.csv

【讨论】：

【解决方案2】：

你可以使用in来检查IP是否在BigList中，然后写入第三个文件

littlelistIPs = ['10.187.172.140', '10.187.172.141', '10.187.172.142', '10.187.172.143',
                '10.187.172.144', '10.187.172.145', '10.187.172.154', '10.187.172.155']

biglistIPs = ['10.187.172.146', '10.187.172.147', '10.187.172.148', '10.187.172.149',
              '10.187.172.150', '10.187.172.151', '10.187.172.152', '10.187.172.153',
              '10.187.172.154', '10.187.172.155']

with open('output.csv', 'w') as f:
    for i in littlelistIPs:
        if i in biglistIPs:
            f.write(i + ' | present in BigListIPs.csv\n')
        else:
            f.write(i + ' | Not present in BigListIPs.csv\n')

【讨论】：

假设 BigListIPs 很大，将其转换为集合将使查找成本更低。
@spazm 是的，如果有重复的话，这是非常正确的，但我假设不会有任何重复。