Python 名称抓取器答案

【问题标题】：Python name grabberPython 名称抓取器
【发布时间】：2009-11-02 06:43:20
【问题描述】：

如果我有一个格式为

的字符串

（静态字符串）名称（不同的静态字符串）消息（最后一个静态字符串）

在消息中搜索单词并生成包含消息中包含该单词的所有名称的数组的最佳方法是什么？

【问题讨论】：

提供一个更清晰的例子。这些括号实际上在您的数据文件中吗？什么是静态字符串？无论如何，展示示例并正确描述您想要获得的输出。

标签： python regex parsing

【解决方案1】：

>>> s="(static string) name (different static string ) message (last static string)"
>>> _,_,s=s.partition("(static string)")
>>> name,_,s=s.partition("(different static string )")
>>> message,_,s=s.partition("(last static string)")
>>> name
' name '
>>> message
' message '

【讨论】：

这比使用正则表达式要好，因为只有当您有复杂的模式匹配而其他字符串操作无法轻松完成时，您才应该使用正则表达式。在使用正则表达式之前检查字符串模块的方法。

【解决方案2】：

期待这个字符串：

Foo NameA Bar MessageA Baz

这个正则表达式将匹配：

Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz

组 1 将是名称，组 2 将是消息。 FooBarBaz 是静态部分。

这里是使用 Python 的 repl：

Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = "Foo NameA Bar MessageA Baz"
>>> m = re.match("Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz", s)
>>> m.group(0)
'Foo NameA Bar MessageA Baz'
>>> m.group(1)
'NameA'
>>> m.group(2)
'MessageA'
>>>

【讨论】：

【解决方案3】：

这是一个完整的答案，展示了如何使用 replace() 来完成它。

strings = ['(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)']

results = []
target_word = 'message'
separators = ['(static string)', '(different static string )', '(last static string)']

for s in strings:
    for sep in separators:
        s = s.replace(sep, '')
    name, message = s.split()
    if target_word in message:
        results.append((name, message))

>>> results
[('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message')]

请注意，这将匹配任何包含子字符串 target_word 的 message。它不会寻找单词边界，例如将此运行与target_word = 'message' 与target_word = 'sag' 进行比较 - 将产生相同的结果。如果您的单词匹配更复杂，您可能需要正则表达式。

【讨论】：

【解决方案4】：

for line in open("file"):
    line=line.split(")")
    for item in line:
        try:
            print item[:item.index("(")]
        except:pass

输出

$ more file
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
$ python python.py

 name
 message

 name
 message

 name
 message

 name
 message

【讨论】：