附加具有不同变量的文件答案

【问题标题】：Append files having different variables附加具有不同变量的文件
【发布时间】：2013-06-26 19:58:53
【问题描述】：

我有以下格式的 3 个文件-

文件1：

ID Var1 Var2
001 5 10
002 12 6

文件2：

ID Var1 Var3 Var5
003 5 10 9
004 12 6 1

文件3：

ID Var3 Var4
005 5 10
006 12 6

我想要以下格式的输出

ID Var1 Var2 Var3 Var4 Var5
001 5 10 0 0 0
002 12 6 0 0 0
003 5 0 10 0 9
004 12 0 6 0 1
005 0 0 5 10 0
006 0 0 12 6 0

请告诉我如何在 python 中做到这一点

【问题讨论】：

欢迎来到 Stack Overflow！看起来您希望我们为您编写一些代码。虽然许多用户愿意为陷入困境的程序员编写代码，但他们通常只会在发布者已经尝试自己解决问题时提供帮助。展示这项工作的一个好方法是包含您迄今为止编写的代码、示例输入（如果有的话）、预期输出和您实际获得的输出（控制台输出、堆栈跟踪、编译器错误 - 不管是什么适用的）。您提供的详细信息越多，您可能收到的答案就越多。
文件中的值之间是空格还是制表符？你看过csv module吗？它可以或多或少开箱即用地做到这一点。
@TimPietzcker - 这些是文件中的空格@MartijnPieters - 我是 python 新手，所以任何帮助获得有用的功能将不胜感激
@abhishekraghuvanshi：如果你是新手，我会先从the Python tutorial开始。

标签： python

【解决方案1】：

如前所述，您应该看看 csv 模块，这里有一些东西可以帮助您入门。

outfile = open("output.txt", 'w')
for file_ in os.listdir("\path\to\my\files"):
    with open(file_) as f:
        for line_number, line in enumerate(file_):
            if line_number > 0: #omit the headers
                outfile.write(line)

另外用python操作文件似乎是fairly common question on SO，也许你可以搜索其中一些看看其他人是怎么做的。

【讨论】：

【解决方案2】：

#use fileinput module if you're reading multiple files at once
import fileinput
dic = {}         # initialize an empty dict. This swill be used to store the value of
                 # (id,var) pair fetched from the file.

for line in fileinput.input(['file1','file2','file3']):

    #if 'ID' is present in the line then it means it is the header line
    if 'ID' in line:
        vars = line.split()[1:] # extract the vars from it
                                # for file1 vars would be ['Var1', 'Var2']

    else:                            #else it is normal line
         spl =line.split()           # split the line at whitespaces
                                     # for the line '001 5 10\n' this would return
                                     # ['001', '5', '10'] 

        idx, vals = spl[0], spl[1:]  # assign the first value from spl 
                                     # to idx and rest to vals

        #now use zip to iterate over vars and vals, zip will return
        #item on the same index from the iterables passed to it.
        for x, y in zip(vars, vals): 
            dic[idx,x] = y          # use a tuple ('001','Var1') as key and 
                                    # assign the value '5' to it. Similarly
                                    # ('001','Var2') will be assigned '10'

#get a sorted list of unique vars and Ids
vars = sorted(set(item[1] for item in dic))
idxs = sorted(set(item[0] for item in dic), key = int)

print " ".join(vars)  #print header
# now iterate over the IDs and for each ID print the pick var from Vars and print the     
# value of  (id,Var),,, etc.
for x in idxs:
                     # dict.get will return the default value '0' if a 
                     # combination of (id,var) is not found in dict.
    print x," ".join(dic.get((x,y),'0') for y in vars)

    #use string formatting for better looking output.

输出：

Var1 Var2 Var3 Var4 Var5
001 5 10 0 0 0
002 12 6 0 0 0
003 5 0 10 0 9
004 12 0 6 0 1
005 0 0 5 10 0
006 0 0 12 6 0

【讨论】：

【解决方案3】：

为了合并几个文件，你可以使用这样的函数，利用 Python 的defaultdict:

def read_from_file(filename, dictionary):
    with open(filename) as f:
        lines = f.read().splitlines()
        head, body = lines[0].split(), lines[1:]
        for line in body:
            for i, item in enumerate(line.split()):
                if i == 0:
                    d = dictionary[item]
                else:
                    d[head[i]] = item

from collections import defaultdict
from pprint import pprint
d = defaultdict(defaultdict)
read_from_file("file1", d)
read_from_file("file2", d)
read_from_file("file3", d)
pprint(dict(d))

输出：

{'001': defaultdict(None, {'Var1': '5', 'Var2': '10'}),
 '002': defaultdict(None, {'Var1': '12', 'Var2': '6'}),
 '003': defaultdict(None, {'Var5': '9', 'Var1': '5', 'Var3': '10'}),
 '004': defaultdict(None, {'Var5': '1', 'Var1': '12', 'Var3': '6'}),
 '005': defaultdict(None, {'Var4': '10', 'Var3': '5'}),
 '006': defaultdict(None, {'Var4': '6', 'Var3': '12'})}

现在剩下要做的就是将这个字典漂亮地打印成一个表格。

【讨论】：