【问题标题】:Search for string in CSV Files using python and write the results使用 python 在 CSV 文件中搜索字符串并写入结果
【发布时间】:2014-03-12 23:19:12
【问题描述】:
   #!/usr/bin/python

import csv
import re

string_1 = ('OneTouch AT')
string_2 = ('LinkRunner AT')
string_3 = ('AirCheck')

#searched = ['OneTouch AT', 'LinkRunner AT', 'AirCheck']
print "hello Pythong! "

#def does_match(string):
#    stringl = string.lower()
#    return any(s in stringl for s in searched)

inFile  = open('data.csv', "rb")
reader = csv.reader(inFile)
outFile  = open('data2.csv', "wb")
writer = csv.writer(outFile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:
    found = False
    for col in row:
        if col in [string_1, string_2, string_3] and not found:
            writer.writerow(row)
            found = True


#for row in reader:
 #   if any(does_match(col) for col in row):
  #      writer.writerow(row[:2]) # write only 2 first columns

inFile.close()
outFile.close()

我试图弄清楚如何在 CSV 文件中搜索 3 个项目。如果这些项目存在,则打印该行。理想情况下,我只想将第 1 列和第 3 列打印到新文件中。

示例数据文件

LinkRunner AT Video,10,20
Wireless Performance Video OneTouch AT,1,2
Wired OneTouch AT,200,300
LinkRunner AT,200,300
AirCheck,200,300

【问题讨论】:

  • 目前唯一不起作用的是我只想打印第一列和第三列。理想情况下,它会先打印出 aircheck 行,然后是 Linkrunner 行,然后是 onetouch at 行。

标签: python string search csv


【解决方案1】:

我试图弄清楚如何在 CSV 文件中搜索 3 个项目。如果 这些项目存在打印行。理想情况下,我只想要第 1 列 和 3 打印到新文件。

试试这个:

import csv

search_for = ['OneTouch AT','LinkRunner AT','AirCheck']

with open('in.csv') as inf, open('out.csv','w') as outf:
    reader = csv.reader(inf)
    writer = csv.writer(outf, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in reader:
        if row[0] in search_for:
            print('Found: {}'.format(row))
            writer.writerow(row)

【讨论】:

    【解决方案2】:
    #!/usr/bin/python
    
    import csv
    import numpy as np
    
    class search_csv(object):
        def __init__(self, infile, outfile):
            infile = open(infile, 'rb')
            read_infile = [i for i in csv.reader(infile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)]
            self.non_numpy_data = read_infile
            self.data = np.array(read_infile, dtype=None)
            self.outfile = open(outfile, 'wb')
            self.writer_ = csv.writer(self.outfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    
        def write_to(self, matched_values):
            self.writer_.writerows(matched_values)
            print ' Matched Values Written '
            return True
    
        def searcher(self, items, return_cols=[0,2]): ##// items should be passed as list -> ['OneTouch AT', 'LinkRunner AT', 'AirCheck']
            find_these = np.array(items, dtype=None)
            matching_y = np.in1d(self.data, find_these).reshape(self.data.shape).nonzero()[0]
            matching_data = self.data[matching_y][:,return_cols]
            self.write_to(matching_data)
            self.outfile.close()
            return True
    
        def non_numpy_search(self, items, return_cols=[0,2]):
            lst = []
            for i in self.non_numpy_data:
                for ii in items:
                    if ii in i:
                        z = []
                        for idx in return_cols:
                            z.append(i[idx])
                        lst.append(z)
                    break
            self.write_to(lst)
            return True
    
    
    ### now use the class ###
    
    SEARCHING_FOR = ['OneTouch AT', 'LinkRunner AT', 'AirCheck']
    
    IN_FILE = 'in_file.csv'
    OUT_FILE = 'out_file.csv'
    
    non_numpy_search(IN_FILE, OUT_FILE).non_numpy_search(SEARCHING_FOR)
    

    根据您的问题的措辞,我假设您只是想完成手头的任务而不关心如何完成。因此,复制并粘贴它并将您的数据文件用作“IN_FILE”值,将要写入的文件名用作“OUT_FILE”值。完成后将要搜索的值放入“SEARCHING_FOR”列表中。

    注意事项.... SEARCHING_FOR 应该是一个列表。

    SEARCHING_FOR 中的值完全匹配,因此“A”将不匹配“a”。如果您想使用正则表达式或更复杂的东西,请告诉我。

    在函数“non_numpy_search”中有一个“return_cols”参数。它默认为第一列和第三列。

    如果你没有 numpy,请告诉我。

    【讨论】:

    • 我收到以下消息: Traceback(最近一次调用最后一次):文件“csvparserV2.0.py”,第 34 行,在 search_csv(IN_FILE, OUT_FILE).searcher(SEARCHING_FOR) 文件中“csvparserV2.0.py”,第 21 行,在 searcher matching_y = np.in1d(self.data, find_these).reshape(self.data.shape).nonzero()[0] 文件“/System/Library/Frameworks/ Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/arraysetops.py",第 335 行,in1d 顺序 = ar.argsort(kind='mergesort') TypeError: requested sort not available for type跨度>
    • 你知道你运行的是什么版本的 numpy 吗?打印 np.__version__
    • 如果文件不是很大,我刚刚发布的编辑应该可以正常工作。
    • 如果您尝试根据“SEARCHING_FOR”中的多个值匹配一行,则还与“non_numpy_search”一起使用,您将不得不修改代码。
    • >>> 导入 numpy >>> 打印 numpy.__version__ 1.6.2 >>>
    【解决方案3】:
    #!/usr/bin/python
    
    import csv
    import re
    import sys
    import gdata.docs.service
    
    
    #string_1 = ('OneTouch AT')
    #string_2 = ('LinkRunner AT')
    #string_3 = ('AirCheck')
    
    searched = ['aircheck', 'linkrunner at', 'onetouch at']
    
    def find_group(row):
        """Return the group index of a row
            0 if the row contains searched[0]
            1 if the row contains searched[1]
            etc
            -1 if not found
        """
        for col in row:
            col = col.lower()
            for j, s in enumerate(searched):
                if s in col:
                    return j
            return -1
    
    def does_match(string):
        stringl = string.lower()
        return any(s in stringl for s in searched)
    
    #Opens Input file for read and output file to write.
    inFile  = open('data.csv', "rb")
    reader = csv.reader(inFile)
    outFile  = open('data2.csv', "wb")
    writer = csv.writer(outFile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
    
    
    #for row in reader:
    #   found = False
    #   for col in row:
    #       if col in [string_1, string_2, string_3] and not found:
    #           writer.writerow(row)
    #           found = True
    
    
    
    """Built a list of items to sort. If row 12 contains 'LinkRunner AT' (group 1),
        one stores a triple (1, 12, row)
        When the triples are sorted later, all rows in group 0 will come first, then
        all rows in group 1, etc.
    """
    stored = []
    for i, row in enumerate(reader):
        g = find_group(row)
        if g >= 0:
            stored.append((g, i, row))
    stored.sort()
    
    for g, i, row in stored:
        writer.writerow(tuple(row[k] for k in (0,2))) # output col 1 & 5
    
    #for row in reader:
     #   if any(does_match(col) for col in row):
      #      writer.writerow(row[:2]) # write only 2 first columns
    
    # Closing Input and Output files.
    inFile.close()
    outFile.close()
    

    【讨论】:

    • 在谷歌搜索更多阅读后,我想出了另一种方法来做到这一点。 :-) 目标已经实现。下一个目标是更新 google docs 上的电子表格。
    猜你喜欢
    • 2020-08-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-04-03
    • 2021-10-18
    • 1970-01-01
    相关资源
    最近更新 更多