尝试逐步迭代python中的2个文件答案

【问题标题】：Trying to stepwise iterate through 2 files in python尝试逐步迭代python中的2个文件
【发布时间】：2019-09-13 23:57:32
【问题描述】：

我正在尝试将两个 LARGE 输入文件合并为 1 个输出，同时进行排序。

## Above I counted the number of lines in each table

print("Processing Table Lines: table 1 has " + str(count1) + " and table 2 has " + str(count2) )
newLine, compare, line1, line2 = [], 0, [], []

while count1 + count2 > 0:
    if count1 > 0 and compare <= 0: count1, line1 = count1 - 1, ifh1.readline().rstrip().split('\t')
    else: line1 = []
    if count2 > 0 and compare >= 0: count2, line2 = count2 - 1, ifh2.readline().rstrip().split('\t')
    else: line2 = []

    compare = compareTableLines( line1, line2 )
    newLine = mergeLines( line1, line2, compare, tIndexes )

    ofh.write('\t'.join( newLine + '\n'))

我期望发生的是，当将行写入输出时，如果可用，我会拉出我曾经读入的文件中的下一行。我还希望一旦两个文件都为空，循环就会中断。

但是我不断收到此错误： ValueError：混合迭代和读取方法会丢失数据

我只是不知道如何绕过它。任何一个文件都太大而无法保存在内存中，所以我想边读边读。

【问题讨论】：

两个输入文件是否排序？你能给我们看一些例子吗？ compareTableLines() 是做什么的？

标签： python loops readlines

【解决方案1】：

这是一个合并两个有序文件的示例，在本例中为 CSV 文件，使用 heapq.merge() 和 itertools.groupby()。给定 2 个 CSV 文件：

x.csv：

key1,99
key2,100
key4,234

y.csv：

key1,345
key2,4
key3,45

跑步：

import csv, heapq, itertools

keyfun = lambda row: row[0]

with open("x.csv") as inf1, open("y.csv") as inf2, open("z.csv", "w") as outf:
    in1, in2, out = csv.reader(inf1), csv.reader(inf2), csv.writer(outf)
    for key, rows in itertools.groupby(heapq.merge(in1, in2, key=keyfun), keyfun):
        out.writerow([key, sum(int(r[1]) for r in rows)])

我们得到：

z.csv：

key1,444
key2,104
key3,45
key4,234

【讨论】：