检查单词是否以某些前缀开头的最有效方法是什么？答案

【问题标题】：What is the most efficient way to check if a word starts with certain prefixes?检查单词是否以某些前缀开头的最有效方法是什么？
【发布时间】：2019-07-29 23:26:52
【问题描述】：

我想检查一个连字符的单词是否以以下集合中的前缀开头。例如，“脱盐”。

prefixes = {
    'de-', 'dis-', 'il-', 'im-', 'ir-', 'inter-',
    'mid-', 'mis-', 'non-', 'pre-', 'pro-', 're-',
    'semi-', 'sub-', 'tele-', 'trans-',
    'un-', 'e-'
}

这是我的代码：

def prefix(word):
    match = re.match(r"[a-z]+-",word)
    if match:
        if match.group() in prefixes:
            return True
word = "e-mail"
print(prefix(word))

【问题讨论】：

你想让“电话”匹配吗？
这已经有答案here它是python内置的
没有。所有单词都是连字符。
不错。谢谢！
这个问题是针对编码效率还是运行时效率？ [但是，如果有比 Jab 的评论更好的答案，我会感到惊讶。]

标签： python regex

【解决方案1】：

您可以先对前缀进行排序，这样您就可以使用bisect.bisect_left 方法在O(log n) 时间复杂度内找到前缀中小于给定单词的最接近单词：

from bisect import bisect_left
prefixes = sorted(prefixes)
def prefix(prefixes, word):
    i = bisect_left(prefixes, word)
    if i and word.startswith(prefixes[i - 1]):
        return prefixes[i - 1]
    raise ValueError("No prefix found for '%s'." % word)

这样：

print(prefix(prefixes, 'non-word'))
print(prefix(prefixes, 'tele-video'))
print(prefix(prefixes, 'e-mail'))

输出：

non-
tele-
e-

【讨论】：

【解决方案2】：

Bisect 比这更好。但是运行时不考虑比较前缀。（运行时 = O(n log(n))，如果您考虑前缀的相似前缀。但对于示例，这是一个更好的解决方案。）

最有效的方法是仅使用前 n 个字符（带有 n = 最大长度前缀）[可选：状态机也可以为您执行此操作] 并将这些字母中的每一个都交给状态机。

该状态机需要决定哪些前缀仍然可以获得。

E.g. to be tested: "prefix" with your list of prefixes
You start with "" -> everything is possible
You read the "p" -> {pro, pre} are possible prefixes now
You read the "r" -> still the same, both start with "pr"
You read the "e" -> pro is not possible and pre has been found.

可以从前缀列表生成状态机。但我不会深入。

但它应该产生一个状态和一个依赖于当前状态和下一个读取字符的转换表。

An example:
Let me add prof to your list of prefixes.

0:
p -> 1
? -> to be added, there are more prefixes

1:
r -> 2
? -> terminate, nothing found

2:
e -> terminate, found pre
o -> 3, found pro
? -> -1

3:
f -> terminate, found pro and prof
? -> terminate, found pro

如何阅读：状态：读取字符 -> 下一个状态，找到 ? = 其他的

【讨论】：

【解决方案3】：

在你的情况下，我猜散列会很有效。

m=set()
for x in prefixes:
    m.add(x.split(‘-‘)[0])

return word.split(‘-‘)[0] in m

【讨论】：