【问题标题】:Append files having different variables附加具有不同变量的文件
【发布时间】:2013-06-26 19:58:53
【问题描述】:

我有以下格式的 3 个文件-

文件1:

ID Var1 Var2
001 5 10
002 12 6

文件2:

ID Var1 Var3 Var5
003 5 10 9
004 12 6 1

文件3:

ID Var3 Var4
005 5 10
006 12 6

我想要以下格式的输出

ID Var1 Var2 Var3 Var4 Var5
001 5 10 0 0 0
002 12 6 0 0 0
003 5 0 10 0 9
004 12 0 6 0 1
005 0 0 5 10 0
006 0 0 12 6 0

请告诉我如何在 python 中做到这一点

【问题讨论】:

  • 欢迎来到 Stack Overflow!看起来您希望我们为您编写一些代码。虽然许多用户愿意为陷入困境的程序员编写代码,但他们通常只会在发布者已经尝试自己解决问题时提供帮助。展示这项工作的一个好方法是包含您迄今为止编写的代码、示例输入(如果有的话)、预期输出和您实际获得的输出(控制台输出、堆栈跟踪、编译器错误 - 不管是什么适用的)。您提供的详细信息越多,您可能收到的答案就越多。
  • 文件中的值之间是空格还是制表符?你看过csv module吗?它可以或多或少开箱即用地做到这一点。
  • @TimPietzcker - 这些是文件中的空格@MartijnPieters - 我是 python 新手,所以任何帮助获得有用的功能将不胜感激
  • @abhishekraghuvanshi:如果你是新手,我会先从the Python tutorial开始。

标签: python


【解决方案1】:

如前所述,您应该看看 csv 模块,这里有一些东西可以帮助您入门。

outfile = open("output.txt", 'w')
for file_ in os.listdir("\path\to\my\files"):
    with open(file_) as f:
        for line_number, line in enumerate(file_):
            if line_number > 0: #omit the headers
                outfile.write(line)

另外用python操作文件似乎是fairly common question on SO,也许你可以搜索其中一些看看其他人是怎么做的。

【讨论】:

    【解决方案2】:
    #use fileinput module if you're reading multiple files at once
    import fileinput
    dic = {}         # initialize an empty dict. This swill be used to store the value of
                     # (id,var) pair fetched from the file.
    
    for line in fileinput.input(['file1','file2','file3']):
    
        #if 'ID' is present in the line then it means it is the header line
        if 'ID' in line:
            vars = line.split()[1:] # extract the vars from it
                                    # for file1 vars would be ['Var1', 'Var2']
    
        else:                            #else it is normal line
             spl =line.split()           # split the line at whitespaces
                                         # for the line '001 5 10\n' this would return
                                         # ['001', '5', '10'] 
    
            idx, vals = spl[0], spl[1:]  # assign the first value from spl 
                                         # to idx and rest to vals
    
            #now use zip to iterate over vars and vals, zip will return
            #item on the same index from the iterables passed to it.
            for x, y in zip(vars, vals): 
                dic[idx,x] = y          # use a tuple ('001','Var1') as key and 
                                        # assign the value '5' to it. Similarly
                                        # ('001','Var2') will be assigned '10'
    
    #get a sorted list of unique vars and Ids
    vars = sorted(set(item[1] for item in dic))
    idxs = sorted(set(item[0] for item in dic), key = int)
    
    print " ".join(vars)  #print header
    # now iterate over the IDs and for each ID print the pick var from Vars and print the     
    # value of  (id,Var),,, etc.
    for x in idxs:
                         # dict.get will return the default value '0' if a 
                         # combination of (id,var) is not found in dict.
        print x," ".join(dic.get((x,y),'0') for y in vars)
    
        #use string formatting for better looking output.
    

    输出:

    Var1 Var2 Var3 Var4 Var5
    001 5 10 0 0 0
    002 12 6 0 0 0
    003 5 0 10 0 9
    004 12 0 6 0 1
    005 0 0 5 10 0
    006 0 0 12 6 0
    

    【讨论】:

      【解决方案3】:

      为了合并几个文件,你可以使用这样的函数,利用 Python 的defaultdict:

      def read_from_file(filename, dictionary):
          with open(filename) as f:
              lines = f.read().splitlines()
              head, body = lines[0].split(), lines[1:]
              for line in body:
                  for i, item in enumerate(line.split()):
                      if i == 0:
                          d = dictionary[item]
                      else:
                          d[head[i]] = item
      
      from collections import defaultdict
      from pprint import pprint
      d = defaultdict(defaultdict)
      read_from_file("file1", d)
      read_from_file("file2", d)
      read_from_file("file3", d)
      pprint(dict(d))
      

      输出:

      {'001': defaultdict(None, {'Var1': '5', 'Var2': '10'}),
       '002': defaultdict(None, {'Var1': '12', 'Var2': '6'}),
       '003': defaultdict(None, {'Var5': '9', 'Var1': '5', 'Var3': '10'}),
       '004': defaultdict(None, {'Var5': '1', 'Var1': '12', 'Var3': '6'}),
       '005': defaultdict(None, {'Var4': '10', 'Var3': '5'}),
       '006': defaultdict(None, {'Var4': '6', 'Var3': '12'})}
      

      现在剩下要做的就是将这个字典漂亮地打印成一个表格。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-06-06
        • 2021-03-09
        • 2022-11-15
        • 2021-11-29
        • 2016-05-14
        相关资源
        最近更新 更多