【问题标题】:searching & comparing valuse in csv files with python用python搜索和比较csv文件中的值
【发布时间】:2011-03-18 10:09:30
【问题描述】:

我有两个 csv 文件 - 一个主文件和一个更新文件。我想从更新文件中获取特定列,并根据主文件检查值。

两个文件将具有相同的列并且大致如下所示:

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)  
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,  
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14,  

基本上,我想遍历更新文件,从更新文件中获取每个股票代码值并检查它是否存在于主文件中。

然后,对于每个匹配的股票代码,我需要检查 Director 名称值的差异,跟踪那些不匹配的。

我遵循了这个例子,但它似乎并没有完全满足我的需求(或者我不完全理解它......):Python: Comparing two CSV files and searching for similar items

f1 = file(csvHKX, 'rU')
f2 = file(csvWRHK, 'rU')
f3 = file('results.csv', 'w')

csv1 = csv.reader(f1)
csv2 = csv.reader(f2)
csv3 = csv.writer(f3)

scode = [row for row in csv2]

for hkx_row in csv1:
  for wrhk_row in scode:
    if hkx_row[2] != wrhk_row[2]:
      print 'HKX:', hkx_row
    continue

f1.close()
f2.close()
f3.close()

更新文件包含以下股票代码:“00121”和“01003”(用于测试)。

似乎代码正在遍历列表,比较每一行并打印出一行,如果股票代码与行不匹配。因此,当第一列读取“00121”时,它会打印出包含“01003”的行,反之亦然。

但我只对在 wrhk_row[2] 中找不到 hkx_row[2] ANYWHERE 时感兴趣

【问题讨论】:

  • 如果它们不同,你需要做什么?更新master?
  • 如果您在链接的示例中具体说明缺少/不符合您的喜好,我们可能会更容易回答。
  • “但它似乎并没有完全满足我的需要”?为什么不?您的实际代码实际上存在什么实际问题?请发布您的代码和您的错误或问题。
  • 也许如果您的代码很敏感,您至少可以将一些更改了变量名的伪代码拼凑在一起?
  • 对不起,添加了一些关于什么不起作用和代码的细节。现在,如果主代码中不存在股票代码,我只需将该行写入一个新文件。谢谢!

标签: python csv


【解决方案1】:

这对你有帮助吗? :

文件 ma​​ster.csv

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)  
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,  
C.P. Lotus Corporation,________,00121,Current,CHEARAVANONT Soopakij,___,Executive Director,CEO,2000-04-14,  
C.P. Lotus Corporation,________,00123,Current,DEANINO James,___,Pilot,,2009-06-25,
C.P. Lotus Corporation,________,00129,Current,GINGE Ivy,___,Dental Technician,,2010-07-27,
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04,
C.P. Lotus Corporation,________,00119,Current,FIELD Mary,___,Pastrycook,,2009-06-25,

文件update.csv

Listed Company's English Name,Listed Company's Chinese Name,Stock Code,Listing Status,Director's English Name,Director's Chinese Name,Capacity,Position,Appointment Date (yyyy-mm-dd),Resignation Date (yyyy-mm-dd)  
C.P. Lotus Corporation,________,00133,Current,THOMPSON Sarah,___,Cosmonaut,,2004-01-20,
C.P. Lotus Corporation,________,00122,Current,CHEARAVANONT Dhanin,___,Executive Director,,2009-12-31,
C.P. Lotus Corporation,________,00121,Current,CHEARAVANON Narong,___,Executive Director,,2001-02-01,  
C.P. Lotus Corporation,________,00121,Current,BEARD Sophia,___,Executive Director,CEO,2010-04-26,   
C.P. Lotus Corporation,________,00127,Current,ERATOR Jane,___,Engineer,,2005-12-04,
C.P. Lotus Corporation,________,00129,Current,MISTOUKI Hassan,___,Folk Singer,,2010-07-27,

代码

import csv

mas = csv.reader(open('master.csv','rb'))
upd = csv.reader(open('update.csv','rb'))

set24 = set((row[2],row[4]) for row in mas)
print set24
print

updkept = [ row for row in upd if (row[2],row[4]) not in set24]
print '\n'.join(map(str,updkept))

结果

set([('00127', 'ERATOR Jane'), ('00121', 'CHEARAVANONT Soopakij'), ('00121', 'CHEARAVANON Narong'), ('00119', 'FIELD Mary'), ('00122', 'CHEARAVANONT Dhanin'), ('Stock Code', "Director's English Name"), ('00129', 'GINGE Ivy'), ('00123', 'DEANINO James')])

['C.P. Lotus Corporation', '________', '00133', 'Current', 'THOMPSON Sarah', '___', 'Cosmonaut', '', '2004-01-20', '']
['C.P. Lotus Corporation', '________', '00121', 'Current', 'BEARD Sophia', '___', 'Executive Director', 'CEO', '2010-04-26', '   ']
['C.P. Lotus Corporation', '________', '00129', 'Current', 'MISTOUKI Hassan', '___', 'Folk Singer', '', '2010-07-27', '']

【讨论】:

  • 是的。这似乎可以解决问题!不熟悉 set(这里的总 n00b ......)。谢谢!
  • @Nathan li24 = [(row[2],row[4]) for row in mas] 也是可能的,但集合更好,因为它的元素被保存为哈希,据我所知,这允许更快的搜索(我认为我的措辞英语很差,请原谅并纠正我)
猜你喜欢
  • 1970-01-01
  • 2020-08-27
  • 1970-01-01
  • 2015-09-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-01-15
相关资源
最近更新 更多