将不同 CSV 中的列合并到一个文件中答案

【问题标题】：Merge column from different CSV into a single file将不同 CSV 中的列合并到一个文件中
【发布时间】：2018-03-17 13:37:58
【问题描述】：

基本上我有 2 个 csv 文件如下：

File 1:      File 2:          Current output:
   Num          Num2          Num
    1            1             1
    2            2             2
    3            3             3
    4            4             4
                                  Num2
                                   1
                                   2
                                   3
                                   4

我想将它们合并成一个单独的 csv 文件，如下所示：

Expected File 3:
    Num Num2
    1   1
    2   2
    3   3
    4   4

但是，当我合并文件时，它从文件 1 数据的底部开始。如何让它们从第 2 列第 1 行开始，而不是从下面开始。

inputs = ["asd.csv", "b.csv"]  # etc

# First determine the field names from the top line of each input file
# Comment 1 below
fieldnames = []
for filename in inputs:
  with open(filename, "r", newline="") as f_in:
    reader = csv.reader(f_in)
    headers = next(reader)
    for h in headers:
      if h not in fieldnames:
        fieldnames.append(h)

# Then copy the data
with open("out.csv", "w", newline="") as f_out:   # Comment 2 below
  writer = csv.DictWriter(f_out, fieldnames=fieldnames)
  for filename in inputs:
    with open(filename, "r", newline="") as f_in:
      reader = csv.DictReader(f_in)  # Uses the field names in this file
      for line in reader:
        # Comment 3 below
        writer.writerow(line)

【问题讨论】：

您希望 'Num 1 Num 2 在单个列中？

标签： python csv merge

【解决方案1】：

你可以使用zip:

import csv
inputs = ["asd.csv", "b.csv"]
new_data = [filter(None, a+b) for a, b in zip(*[list(csv.reader(open(i))) for i in inputs])]
with open('filename.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows(new_data)

输出：

Num,Num2
1,1
2,2
3,3
4,4

【讨论】：

【解决方案2】：

使用熊猫，

import pandas as pd
inputs = ["asd.csv", "b.csv"]
df1=pd.read_csv(inputs[0])
df2=pd.read_csv(inputs[1])
df3["Num1 Num2"]= df1["Num1"]+" "+df2["Num2"] 
df3.to_csv("your_output_path")

【讨论】：

【解决方案3】：

使用熊猫的解决方案：

In [20]: cat a.csv
Num
1
2
3
4

In [21]: cat b.csv
Num2
1
2
3
4

In [22]: import pandas as pd

In [23]: df = pd.concat((pd.read_csv(f) for f in ['a.csv', 'b.csv']), axis=1)

In [24]: df.to_csv('results.csv', index=False, sep=' ')

In [25]: cat results.csv
Num Num2
1 1
2 2
3 3
4 4

【讨论】：

【解决方案4】：

以下方法应该很好用：

from itertools import zip_longest    
import csv

data = []    

for csv_filename in ['asd.csv', 'b.csv']:
    with open(csv_filename, 'r', newline='') as f_input:
        data.append([row[0] for row in csv.reader(f_input)])

with open('out.csv', 'w', newline='') as f_output:        
    csv.writer(f_output).writerows(zip_longest(*data))

它首先将文件列表中的单个列读取到单个 data 列表中，然后使用 zip_longest 将它们组合以创建输出 CSV 文件。

通过使用zip_longest，它可以处理您的文件列表恰好包含不同行数的情况。

【讨论】：