将特定行从一个 CSV 文件添加到另一个文件并应用特定条件答案

【问题标题】：Adding specific rows from one CSV file to another and applying specific conditions将特定行从一个 CSV 文件添加到另一个文件并应用特定条件
【发布时间】：2020-03-06 09:52:34
【问题描述】：

我有一个包含 0 和 1 值的“标志”列的 CSV 文件。我的目标是将所有具有 0 值的行移动到另一个 CSV 文件。该脚本将安排为每小时运行一次，并将具有“0”值的行移动到另一个文件。

到目前为止，我编写了以下代码：

with open("path/to/my/input/file.csv", "rt", encoding="utf8") as f:
reader = csv.DictReader(f, delimiter=',')
with open("/path/to/my/output/file.csv", "a+", encoding="utf8") as f_out:
    writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames, delimiter=",")
    writer.writeheader()
    for row in reader:
        if row['flag'] == '0':
            writer.writerow(row)

借助下面的@Raghvendra 帮助，通过在我的代码中添加“a+”，我可以将行添加到我的 output.csv 文件中。但是，每次脚本运行时，它都会将标题行添加到我的输出文件中。另外，如何防止添加具有匹配 ID 的行？是否可以替换我的 output.csv 文件中 ID 与 input.csv 文件中的 ID 匹配的行，而不是将具有重复 ID 的行添加到 output.csv？

有人可以帮我解决这个问题吗？提前致谢！

输入文件.csv：

id       date          data1     data2    flag
1     2020-03-01      mydata    mydata1    0
2     2020-03-02      mydata     mydata    1
3     2020-03-03      mydata    mydata1    0

【问题讨论】：

您的代码有delimiter=','，但没有您的示例文件。
您的示例文件也没有'MyColumn'，而是flag。如果细节不相符，则不必要地难以提供帮助。
ID是整数值吗，如果是，范围是多少？
输入文件中的行是否每小时都在变化，还是仅添加行？如果是后者，添加的行是否可以与之前的行具有相同的 ID？
您好 Armali，输入文件中的列行会不时更改，但也会添加具有新 ID 的行。 ID 不是整数而是字符串。谢谢！

标签： python csv

【解决方案1】：

现在我的问题是防止将具有重复 ID 的记录添加到我的 output.csv 中。如果可能的话，我需要用匹配的 ID 覆盖记录。

为了匹配ID，我们无法避免读取输出文件。

import csv

data = dict()
# first read the output file in (if one exists already)
try:
    with open("output file.csv", encoding="utf8") as f_out:
        for row in csv.DictReader(f_out): data[row['id']] = row
except OSError: pass

# now add the new rows from the input file; rows with existing id are replaced
with open("input file.csv", encoding="utf8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['MyColumn'] == '0': data[row['id']] = row

with open("output file.csv", "w", encoding="utf8") as f_out:
    writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames)
    writer.writeheader()
    for row in data: writer.writerow(data[row])

【讨论】：

【解决方案2】：

要将新行追加到文件而不是覆盖值，请尝试对文件使用追加 (a) 权限而不是写入 (w)。

with open("/path/to/my/output/file.csv", "a+", encoding="utf8") as f_out:

不需要写t，因为它指的是默认的文本模式。

记录在案的here：

Character   Meaning
    'r'     open for reading (default)
    'w'     open for writing, truncating the file first
    'x'     open for exclusive creation, failing if the file already exists
    'a'     open for writing, appending to the end of the file if it exists
    'b'     binary mode
    't'     text mode (default)
    '+'     open a disk file for updating (reading and writing)
    'U'     universal newlines mode (deprecated)

您问题的第二部分不太清楚。能再详细一点吗？

【讨论】：

感谢您迄今为止的帮助！我尝试使用 'a+' 并且能够添加行而不是覆盖我的 output.csv 文件中的现有行。但是，它还添加了标题行，并且不会阻止我添加具有重复 ID 的行。我实际上需要用匹配的 ID 替换行，而不是复制它们。你能帮忙吗？谢谢！
为避免添加标题行，请尝试删除 writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames, delimiter=",")
通过删除writer.writeheader() 部分解决了这个问题，但现在没有标题（没问题我可以单独添加它们）。现在我的问题是防止将具有重复 ID 的记录添加到我的 output.csv 中。如果可能，我需要用匹配的 ID 覆盖记录。