【问题标题】:Similarities between two csv files两个 csv 文件之间的相似之处
【发布时间】:2014-07-09 16:11:21
【问题描述】:

我对 Python 还很陌生,并且在尝试制作这个查找两个 csv 文件之间匹配项的程序时遇到了很多麻烦。例如,我有两个 csv 文件。第一个称为“列表”,第二个称为“示例”

文件“列表”在第一行包含以下内容: 腿,膝盖,大腿,小腿,脚踝,臀部,脚,脚趾,小腿,脚,髌骨,胫骨,腓骨

文件“示例包含: 学生昨天摔断了腿,今天学生摔断了手臂,今天学生扭伤了大腿,学生扭肘,今天学生扭伤了脚踝

所以基本上如果 csv 文件“example”包含 csv 文件“list”中的任何单词,它应该将其输出到一个新的 csv 文件中,其中包含 example 中的句子,但不是。

到目前为止,这是我的代码:`

import csv

   with open("list.csv", "U") as file1, open("example.csv", "rb") as
   file2,open("finalOutput.csv", "wb") as outputfile:
   reader1 = csv.reader(file1,delimiter=';')
   reader2 = csv.reader(file2,delimiter='|')
   writer = csv.writer(outputfile,delimiter='|')

   rows2 = [row for row in reader2]
   for row1 in reader2:
       for row2 in rows2:
           if row1[0] == row2[0]:
               data = [row1[0],row2[0]]
               print data
               writer.writerow(data)

【问题讨论】:

    标签: python csv


    【解决方案1】:

    为什么不尝试这样的事情(假设如果任何单词与第二个文件中的单词匹配,您希望打印整行。基本上,您将第二行设为字符串,然后检查第一个文件中的任何单词是否在那个字符串。如果是,写出来。

    with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
        reader1 = csv.reader(file1)
        reader2 = csv.reader(file2)
        writer = csv.writer(file3)
    
        reader1_rows = [row for row in reader1]
        reader2_rows = [row for row in reader2]
    
        for num, row in enumerate(reader1_rows):
            if ([word for word in row if word in ' '.join(reader2_rows[num])]):
                writer.writerow([row, reader2_rows[num]])
    

    根据您调整后的评论,我相信这应该会为您提供所需的输出:

    with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
        reader1 = csv.reader(file1)
        reader2 = csv.reader(file2)
        writer = csv.writer(file3)
    
        reader1_rows = [row for row in reader1]
        reader2_rows = [row for row in reader2]
    
        for num, row in enumerate(reader1_rows):
            for word in reader2_rows[num]:
                for item in row:
                    if item in word:
                        writer.writerow([item, word])
    

    更“pythonic”的方式可能如下:

    with open("list.csv", "U") as file1, open("example.csv", "rb") as file2, open("output.csv", "wb+") as file3:
        reader1 = csv.reader(file1)
        reader2 = csv.reader(file2)
        writer = csv.writer(file3)
    
        reader1_rows = [row for row in reader1]
        reader2_rows = [row for row in reader2]
    
        for rowA, rowB in zip(reader1_rows, reader2_rows):
            for word in rowA:
                for item in (item for item in rowB if word in item):
                    writer.writerow([word, item])
    


    如果您想对齐列中的所有数据(为此,您可能应该这样做)并且数据如下所示:
    leg
    knee
    thigh
    shin
    ankle
    hip
    foot
    toe
    calf
    feet
    patella
    tibia
    fibula
    

    ..和..

    Student broke leg yesterday
    Student broke arm today
    Student hurt thigh today
    Student twisted elbow
    Student rolled ankle today
    

    ..那么你可以这样做:

    with open("example.csv") as file1, open("list.csv") as file2, open("output.csv", "wb+") as file3:
        writer = csv.writer(file3)
        key_words = [word.strip() for word in file2.readlines()]
        for row in file1:
            row = row.strip()
            for key in (key for key in key_words if key in row):
                writer.writerow([key, row])
    

    【讨论】:

    • 这帮助我在文件中输出了一些东西,但不一定是我想要得到的东西。例如,在“示例”中有一个示例说“学生今天摔断了手臂”但是这些词都没有在“列表”中,因此我不希望它打印出来。如果你能帮上忙,那就太好了!
    • 请尝试我编辑的答案,让我知道这是否是您正在寻找的输出。我仍然不确定您的文件的外观,但我相信(根据您的描述)这将涵盖您试图完成的任务。可能需要稍微调整输出语法。
    • 非常感谢太棒了!这正是我一直在寻找的,但我无法得到它!
    • 没问题,很高兴我能帮上忙。
    • 示例中的第 2 行将对应于列表中的第 2 行。您是否希望将单行列表应用于示例中的所有行?这就是我不确定您打算如何格式化这些文件的地方。
    【解决方案2】:

    据我所知,你的 csv 文件的结构,我认为你不应该使用 csv-reader 来加载你的示例文件和你的话......

    import csv
    
    with open("list.csv", "U") as file1, open("example.csv", "rb") as
        file2,open("finalOutput.csv", "wb") as outputfile:
    
        writer = csv.writer(outputfile,delimiter='|')
    
        words = set(file1.read().split(','))
    
        # examples are split by "," so read the whole file and split it by ","
        examples = file2.read().split(',')
    
        for word in file1:
            for example in examples:
                # if the word happens to be within the example
                if word in example:
                       # add it to your output file
                       data = [word,example]
                       print data
                       writer.writerow(data)
    

    【讨论】:

    • 这没有在 finalOutput.csv 文件中输出任何内容。它是空的
    • 好的。 @NYJets8778,如果您打印,wordsexamples 会是什么样子?也许不同的读取模式可能是一个问题:open("list.csv", "U")open("example.csv", "rb")??
    • @Bob 实际上用他的代码回答了我的问题。谢谢
    猜你喜欢
    • 2014-03-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-02-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多