Python - 检查一行中的所有单词是否存在于数组中答案

【问题标题】：Python - Checking whether all the words in a line exist in an arrayPython - 检查一行中的所有单词是否存在于数组中
【发布时间】：2016-11-01 18:16:53
【问题描述】：

注意：对于这个问题，我不能使用除 sys 和 io 之外的任何导入

对于分配，我必须接受两个文件作为系统参数，并且两个文件都包含字符串行。

让我的作业开始工作，我想一次读取一个文件中的一行，并检查该行中的所有单词是否都存在于另一个文件中。

这里是文件：

g1.ecfg

S -> NP VP
NP -> Det N
NP -> PN
Det -> "the" 
N -> "dog" 
N -> "rat" 
N -> "elephant"
PN -> "Alice"
PN -> "Bob"
VP -> V NP
V -> "admired" 
V -> "bit" 
V -> "chased"

u1a.utt

the aardvark bit the dog
the dog bit the man
Bob killed Alice

所以，我想阅读 u1a.utt 中的每一行并检查该行中的每个单词是否在 g1.ecfg 中找到。

我认为 g1 中的引号可能有问题，所以我将引号中的所有单词放在一个数组中，不留下引号。

我当前的代码总是返回 false，即使字符串应该打印“Parsing!!!”也会产生“No valid parse”

谁能帮我理解如何将每行中的单词与 g1 文件进行比较？

这是我的代码：

import sys
import io

# usage = python CKYdet.py g#.ecfg u#L.utt

# Command Line Arguments - argv[0], argv[1], argv[2]
script = sys.argv[0]
grammarFile = open(sys.argv[1])
utteranceFile = open(sys.argv[2])

# Initialize rules from grammarFile

ruleArray = []
wordsInQuotes = []
uttWords = []

for line in grammarFile:
    rule = line.rstrip('\n')
    start = line.find('"') + 1
    end = line.find('"', start)
    ruleArray.append(rule)
    wordsInQuotes.append(line[start:end])    #create a set of words from grammar file


for line in utteranceFile:
    x = line.split()
    print x
    if (all(x in grammarFile for x in line)):    #if all words found in grammarFile
        print "Parsing!!!"
    else:
        print "No valid parse"

我认为这可能与我的列表是否可散列有关，或者可能是范围问题，但我正在努力寻找适合我的替代方案。

【问题讨论】：

all(x in grammarFile for x in line)。您正在检查line 中的每个字符是否在grammarFile 中。这似乎不是你想要做的。可能更像x in wordsInQuotes for x in line.split()。
成功了！非常感谢。这只是一个小小的逻辑错误，但我浪费了好几个小时想知道出了什么问题。

标签： python arrays parsing line words

【解决方案1】：

让我们使用集合来存储我们稍后将检查成员资格的项目，并使用str.split 查找引号中的单词。

with open('grammarfile') as f:
    words = set()
    for line in f:
        line = [a for a in line.split() if '"' in a]
        for a in line:
            words.add(a.replace('"', ''))

with open('utterancefile') as f:
    for line in f:
        if all(a in words for a in line.split())
            print("Good Parse")
        else:
            print("Word not found")

【讨论】：

谢谢。我使用了上面注释的答案，但你的也可以，所以我标记它是正确的。