处理文本文件答案

【问题标题】：Manipulating text files处理文本文件
【发布时间】：2014-06-23 21:17:27
【问题描述】：

我有两个文本文件：

file1.txt：

a,1
b,3
c,5
d,-4

和file2.txt：

sample1
a,12 
b,10
c,4
d,6

sample2
a,5 
b,8
c,6
d,12

sample3
a,3 
b,6
c,9
d,10

我想要做的是从 file2.txt 中所有示例中的相应字母中减去 file1.txt 中给定字母的值，并创建多个文件，因此输出如下所示：

sample1 的第一个文件，sample1.txt

sample1.txt
a,11 # 12-1 as 1 from file1.txt was subtracted from 12 in file2.txt
b,7 # 10-3
c,-1 # 4-5
d,10 # 6-(-4)

然后为sample2分离文件，sample2.txt：

sample2.txt
a,4 # 5-1 as 1 from file1.txt was subtracted from 5 in file2.txt
b,5 # 8-3
c,1 # 6-5
d,16 # 12-(-4)

对于 sample3 也是如此。

我尝试循环遍历 file2.txt，但由于我的原始 file2.txt 有超过 1000 个样本，这需要很长时间，有没有更快的 Pythonic 方式来做到这一点？

干杯，凯特

【问题讨论】：

遍历每个文件并构建一个dict()（每个文件），然后您就可以轻松提取相关的字母值。

标签： python text data-manipulation

【解决方案1】：

有趣！一起来看看吧。

设计非常简单。将文件读入字典并对字典进行操作，然后写出文件。

with open('file1.txt') as in_:
    mapping = {}
    for line in in_:
        key,value = line.strip().split(',')
        mapping[key] = int(value)

mapping 现在是 {"a":1, "b":3, "c":5, "d":-4} 让我们读入我们的文件。

values = {}
with open('file2.txt') as in_:
    for _ in range(3):
        # This is ugly, but it's a quick hack. I'd improve it later.
        cur_dict = next(in_).strip()
        values[cur_dict] = {}
        for __ in range(4):
            key, value = next(in_).strip().split(',')
            values[cur_dict][key] = int(value)

Sheesh 这可能是我写过的最丑陋的代码，但 values 现在是 {"sample1": {"a":12, "b":10, "c":4, "d":6}, "sample2": ...}

现在进行操作。这实际上很容易。让我们将文件写入它，因为这一步相当初级

for dataset in values:
    for key, value in mapping.items():
        values[dataset][key] += value
    with open(dataset + ".txt") as out:
        out.write(dataset)
        for key,value in values[dataset]:
            out.write("{},{}\n".format(key,value))

【讨论】：

这还没有经过测试，但对我来说很有意义。
我想代替丑陋的罪孽 for _ in range(3): for __ in range(4): 我可以做 for line in in_ 并测试 if line and "," not in line 以在外部字典之间进行分离。
快速评论：像key, value = next(in_).strip().split(',') 这样的行，如果你在value 上调用int()，那么你不需要调用strip()。 int('5\n') == 5.
@kevinsa5 但" \n \t a,5 \r\n\t ".split(',') 给出" \n \t a", "5 \r\n\t "。 int 只处理值，而不是密钥！
@Adam_Smith 哎呀！确实是“领先空间”。干杯。