【问题标题】:Loop creates unwanted duplicate循环创建不需要的重复项
【发布时间】:2016-09-06 10:49:17
【问题描述】:

我正在尝试从输入文件中提取数据并遍历符号文件以创建输出文件的输出,但我的代码在输出文件中创建了不需要的重复项。输入文件非常大,所以我需要先过滤输入,然后再针对符号(城市/州)文件引用它以生成输出。

i_file = ('InputFile.csv')
o_file = ('OutputFile.csv')
symbol_file = ('SymbolFile.csv')
City = 'Tampa'
State = 'FL'

with open(symbol_file, 'r') as symfile:
    with open(i_file, 'r') as infile:
        with open(o_file, 'w') as outfile:

            reader = csv.reader(infile)
            symbol = csv.reader(symfile)
            writer = csv.writer(outfile, delimiter = ',')

            for row in reader:
                if (row[2] == city and row[3] == state):

                   for line in symbol:
                        if (row[4] == line[0]):
                            nline = ([str(city)] + [str(line[3])])
                            writer.writerow(nline)
                    symfile.seek(0)

【问题讨论】:

  • for line in symbolreader: 什么是符号阅读器?
  • 您确定是循环导致了重复,而不是某种方式的输入文件吗?
  • 沙拉德 - 这是一个错字。 EV。 Kounis - 输入文件不重复。
  • 您能否也发布两个输入文件的样本?每两三行左右。此外,写作发生在if (row[2] == city and row[3] == state):循环下。因此,输出文件中的重复值意味着上述语句至少两次计算为True
  • 这将为输入文件和符号文件之间的每个匹配创建一行。您可能只希望输入文件中的每一行都有一行 IF 符号文件中有匹配的行吗?

标签: python python-3.x loops


【解决方案1】:

如果符号文件中有匹配的行,我只希望输入文件中的每一行都有一行。

然后这样尝试:

i_file = 'InputFile.csv'
o_file = 'OutputFile.csv'
symbol_file = 'SymbolFile.csv'

city = 'Tampa'
state = 'FL'

# load the symbols from the symbol file and store them in a dictionary
symbols = {}
with open(symbol_file, 'r') as symfile:
    for line in csv.reader(symfile):
        # key is line[0] which is the thing we match against
        # value is line[3] which appears to be the only thing of interest later
        symbols[line[0]] = line[3]

# now read the other files
with open(i_file, 'r') as infile, open(o_file, 'w') as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile, delimiter = ',')

    for row in reader:
        # check if `row[4] in symbols`
        # which essentially checks whether row[4] is equal to a line[0] in the symbols file
        if row[2] == city and row[3] == state and row[4] in symbols:
            # get the symbol
            symbol = symbols[row[4]]

            # write output
            nline = [city, symbol]
            writer.writerow(nline)

【讨论】:

    猜你喜欢
    • 2020-05-14
    • 1970-01-01
    • 1970-01-01
    • 2019-02-19
    • 2015-09-21
    • 2019-02-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多