python脚本从另一个csv更新csv中的现有列值答案

【问题标题】：python script to update existing column values in a csv from the other csvpython脚本从另一个csv更新csv中的现有列值
【发布时间】：2020-05-22 10:02:09
【问题描述】：

如果我有 2 个 csv 文件如下：

csv1.csv：

1,Bob,Bob@gmail.com,baseball
2,Tom,Tom@gmail.com.football
3,Bill,Bill@gmail.com,softball
...

csv2.csv：

baseball, b1
football, f1
...

我正在寻找一种 Python 方法，将 csv1 中的错误值（csv1 中的第三列等于 csv2 中的第一列））替换为 csv2 中的正确值（第二列）。

应该是这样的：

1,Bob,Bob@gmail.com,b1
2,Tom,Tom@gmail.com,f1
3,Bill,Bill@gmail.com,softball

我的代码不起作用。

import csv

table1 = r'data.csv'
table2 = r'facebook_creo.csv'

creo_desc = dict()

with open(table2) as tbl2:
    t2 = csv.reader(tbl2, delimiter=',')
    next(t2) 

    for t2row in t2:
        wrong_creo = t2row[0]
        desc = t2row[1]

        creo_desc[wrong_creo] = desc

with open(table1) as tbl1:
    t1 = csv.reader(tbl1,  delimiter=',')
    for t1row in t1:
        wrong_creo = t1row[8]

    t1.writerow(t1row[8])

熊猫版：

import pandas as pd
data = pd.read_csv(r'data.csv')
creo = pd.read_csv(r'creo.csv')
adset = pd.read_csv(r'adset.csv')
campaign = pd.read_csv(r'campaign.csv')
CreoDict = pd.Series(creo.iloc[:,1].values,index=creo.iloc[:,0]).to_dict()
AdsetDict = pd.Series(adset.iloc[:,1].values,index=adset.iloc[:,0]).to_dict()
CampaignDict = pd.Series(adset.iloc[:,1].values,index=adset.iloc[:,0]).to_dict()
data.iloc[:,8] = data.iloc[:,8].replace(CreoDict)
data.iloc[:,6] = data.iloc[:,6].replace(AdsetDict)
data.iloc[:,4] = data.iloc[:,4].replace(CampaignDict)
data.to_csv(r'total.csv')

【问题讨论】：

如果我理解正确，csv2.csv 包含csv1.csv 的第三列中每个可能值的翻译表？
@HampusLarsson 是的，对。这只是一个样本。实际上，第一张桌子要大得多。

标签： python python-3.x pandas csv

【解决方案1】：

我会使用 pandas 读取 2 个表，使用第二个表作为替换值的字典以重新映射到 csv1。

import pandas as pd

# Read in the 2 csv files
csv1 = pd.read_csv('csv1.csv')
csv2 = pd.read_csv('csv2.csv')


#Create dictionary form csv2
replaceDict = pd.Series(csv2.iloc[:,1].values,index=csv2.iloc[:,0]).to_dict()

#Use dictionary to replace values                     
csv1.iloc[:,-1] = csv1.iloc[:,-1].replace(replaceDict)

# Write to file
csv1.to_csv('csv1_new.csv')

输出：

print (csv1)
   0     1               2         3
0  1   Bob   Bob@gmail.com  baseball
1  2   Tom   Tom@gmail.com  football
2  3  Bill  Bill@gmail.com  softball

print (csv2)
          0   1
0  baseball  b1
1  football  f1

然后替换后：

print (csv1)
   0     1               2         3
0  1   Bob   Bob@gmail.com        b1
1  2   Tom   Tom@gmail.com        f1
2  3  Bill  Bill@gmail.com  softball

【讨论】：

它有效。多谢。还有一个问题。如果我需要从 3 个或 5 个其他 csv 文档更新 csv 中的现有列值。步骤应该相同吗？
是的，概念/步骤是一样的。我可能会做的是最初将所有这些“参考” csv 合并到 1 个字典中，然后将其应用到 csv1 一次。
即使我需要替换不同列中的值？
你必须在代码中调整.iloc[:,-1] 以反映你想要的列。您是否可以编辑您的原始问题以提供额外的场景，我可以向您展示/调整我的解决方案？
谢谢。那会很好。我在我的问题中添加了另一个版本。

【解决方案2】：

如果您附上错误消息会更好，但是，我想您应该在要进行更改的地方使用csv.writer，而不是csv.reader..

【讨论】：

【解决方案3】：

import csv

table1 = r'a.csv'
table2 = r'b.csv'

creo_desc = dict()

with open(table2) as tbl2:
  t2 = csv.reader(tbl2, delimiter=',')
  for t2row in t2:
    creo_desc[t2row[0]] = t2row[1]

print(creo_desc)

ans = []
with open(table1,'r') as tbl1:
  t1 = csv.reader(tbl1,  delimiter=',')
  for t1row in t1:
    if t1row[-1] in creo_desc:
      t1row[-1] = creo_desc[t1row[-1]]
    ans.append(t1row)

with open(table1,'w') as tbl1:
  writer = csv.writer(tbl1)
  writer.writerows(ans)

1) 一个.csv

1,Bob,Bob@gmail.com,baseball
2,Tom,Tom@gmail.com.football
3,Bill,Bill@gmail.com,softball

2) b.csv

baseball, b1
football, f1

【讨论】：