从生成器表达式中获取匹配项答案

【问题标题】：get matched item from generator expression从生成器表达式中获取匹配项
【发布时间】：2015-12-06 12:28:06
【问题描述】：

我用生成器表达式编写了if 条件。

self.keyword_list = ['Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources' ,'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to', 'Raids' ,'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts', 'scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'to monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest', 'Buzz', 'Heard on the street', 'familiar with the development', 'familiar with the matter', 'Sources', 'source', 'Anonymous', 'anonymity', 'Rumour', 'Scam', 'Fraud', 'In talks', 'Likely to', 'Cancel', 'May', 'Plans to ', 'Raids', 'raid', 'search', 'Delisting', 'delist', 'Block', 'Exit', 'Cheating', 'Scouts','scouting', 'Default', 'defaulted', 'defaulter', 'Calls off', 'Lease out', 'Pick up', 'delay', 'arrest', 'arrested', 'inks', 'in race', 'enters race', 'mull', 'consider', 'final stage', 'final deal', 'eye', 'eyes', 'probe', 'vie for', 'detects', 'allege', 'alleges', 'alleged', 'fabricated', 'inspection', 'inspected', 'monetise', 'cancellation', 'control', 'pact', 'warning', 'IT scanner', 'Speculative', 'Divest']
if any(re.search(item.lower(), record['title'].lower()+' '+record['description'].lower()) for item in self.keyword_list):
    #for which value of item condition became true?
    #print item does not work
    print record

如果条件为真，那么我想打印匹配的项目名称。我怎么得到这个？

【问题讨论】：

它是一个生成器表达式。有no tuple comprehensions in Python。此外，您不需要使用else: pass，该块完全是可选的，可以省略。

标签： python if-statement generator-expression

【解决方案1】：

不要使用any()，将生成器表达式更改为使用过滤器（将测试移至末尾），然后使用next() 获得第一个匹配项：

matches = (item for item in self.keyword_list if re.search(item.lower(), record['title'].lower() + ' ' + record['description'].lower()))
first_match = next(matches, None)
if first_match is not None:
    print record

或者你可以只使用for 循环并在第一场比赛后中断：

for item in self.keyword_list:
    if re.search(item.lower(), record['title'].lower() + ' ' + record['description'].lower()):
        print record
        break

您可以通过预先计算要匹配的正则表达式并使用re.IGNORECASE 标志来进一步清除这些变体中的任何一个，这样您就不必将所有内容都小写：

pattern = re.compile(
    '{} {}'.format(record['title'], record['description']),
    flags=re.IGNORECASE)
matches = (item for item in self.keyword_list if pattern.search(item))
first_match = next(matches, None)
if first_match is not None:
    print record

或

pattern = re.compile(
    '{} {}'.format(record['title'], record['description']),
    flags=re.IGNORECASE)
for item in self.keyword_list:
    if pattern.search(item):
        print record
        break

【讨论】：

@Martijin：非常感谢，正在努力解决AttributeError: 'module' object has no attribute 'IGNORE'
我添加了flags= re.IGNORECASE 然后给了UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 97: ordinal not in range(128
@cyclic: 抱歉，我记错了标志。如果您使用的是 Unicode 对象，请改用 u'{} {}'.format()（因此请使用 unicode 文字作为格式化字符串），并使用 flags=re.IGNORECASE | re.UNICODE。
@Martijin：谢谢，但仍然是同样的错误。它是否正确？ re.compile('{} {}'.format(record['title'].encode('utf-8').strip(), record['description'].encode('utf-8').strip()),flags=re.IGNORECASE | re.UNICODE)
@cyclic：您的关键字列表中的项目也是 Unicode，大概？我会 not 编码为 UTF-8，因为现在您匹配的是 UTF-8 字节，而不是字符。这可能会导致非常奇怪的结果，并且您无法匹配除纯 ASCII 字符以外的任何内容不区分大小写。