【问题标题】:Join Multiple Files Dictionary加入多个文件字典
【发布时间】:2015-09-12 02:38:58
【问题描述】:

我有一个包含一些字段的主表。我想和一堆其他的 csv 一起加入它。

当前数据如下:

文件 1:

Key  Attrib1  Attrib2  Attrib3  Attrib4

文件 2:

Key Attrib5

文件 3:

Key Attrib6

我希望我的最终输出看起来像:

Key   Attrib1  Attrib2  Attrib3  Attrib4 Attrib5 Attrib6, etc.

并非所有文件都包含所有密钥。

当前代码:

master = "in.csv"
file1 = "file.csv"
file2 = "file2.csv"
prime = list()
D1 = {}

with open(master) as f:
    for k in csv.reader(f):
        prime.append(k[0])

for k in prime:
    with open(file1,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = dict((row[0],row[1]) for rows in rd)
    with open(file2,'r') as csvfile:
        rd = csv.reader(csvfile,delimiter=",")
        for row in rd:
            if row[0] ==k:
                D1 = D1+dict((row[0],row[1]) for rows in rd)

【问题讨论】:

  • 文件 1 是您在代码中所称的 master 吗?如果不是,它是什么样子的?
  • 你怎么知道其他文件的属性是什么?他们每个人只有一个吗?
  • 是的,文件 1 就是我所说的主文件。看起来像: Key Attrib1 Attrib2 Attrib3 Attrib4
  • 在其他文件中,每个文件有 2 列,我知道它们的名称 - 尽管它们在每个文件中都不同。第 1 列始终是关键,但第 2 列可能是各种各样的东西。

标签: python csv join merge python-2.5


【解决方案1】:

如果不是你想要的,我认为这确实关闭了:

master = "in.csv"
filelist = "file.csv", "file2.csv"
joined = "joined.csv"
dict1 = {}

with open(master, 'r') as csvfile:
    for row in csv.reader(csvfile):
        key = row[0]
        dict1[key] = row[1:]  # note this does not check for duplicate keys

for filename in filelist:
    with open(filename, 'rb') as csvfile:
        seen = set()
        for row in csv.reader(csvfile):
            key = row[0]
            if key in dict1:
                if key in seen:
                    print('Error: duplicate key %r in file %r - ignored' %
                                   (key, filename))
                else:
                    dict1[key].append(row[1])
                    seen.add(key)
            else:  # key not in master
                pass  # ignore    

        # add null entry for any keys not present in this file
        for key in dict1:
            if key not in seen:
                dict1[key].append(None)

# write the data in the merged dictionary into a new csv file
with open(joined, 'wb') as newcsvfile:
    csv.writer(newcsvfile).writerows(
        ([key]+attrlist) for key, attrlist in sorted(dict1.iteritems()))

【讨论】:

    【解决方案2】:

    这里的想法是打开所有三个文件并将它们写入一个新的 .csv 文件。我将如何加入 csv 文件的一般想法是这样的:

    import glob
    import csv
    
    # gets all the files in your dictionary that end with .csv
    csv_files = glob.glob('*.csv')
    
            # create the new csv file, which will be your output
            with open('filename.csv', 'w') as outfile:
                    writer = csv.writer(outfile, delimiter = ',')
    
                    for csv_file in csv_files:
                        with open(csv_file) as infile:
                            reader = csv.reader(infile, delimiter = ',')
                            for row in reader:
                                writer.writerow(row)
    

    您必须操纵“行”的确切组成,以使其与数据的工作方式相匹配(在没有您需要的列的数据上创建空列)。

    可能的解决方案是为每个文件创建一个元组格式,在其中为您需要的位置创建空位置。将元组写入行将像这样工作。

    for row in reader:
    
        if csv_file == 'file1':
            # '' represents a blank field in column
            data_to_write = (row[0], row[1], '', row[2])
    
        elif csv_file == 'file2':
            data_to_write = '', row[0], row[1],row[2]
    
        writer.writerow(data_to_write)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-04-12
      • 2019-07-13
      • 2017-07-01
      • 1970-01-01
      • 2018-04-08
      • 1970-01-01
      • 2020-02-11
      • 2012-05-30
      相关资源
      最近更新 更多