模式不符合预期答案

【问题标题】：Patterns do not behave as expected模式不符合预期
【发布时间】：2019-09-08 16:57:48
【问题描述】：

实际的模式不是英文的，所以我创建了这个简化的例子来重现这个问题：有 3 级注释（实际应用需要），第 3 级模式没有按预期工作。要识别的短语是： a b c

我的期望：

第一级：“a”标注为A，“b”标注为“B”
第二个：如果有注解A和B，一起注解为AB
3rd：如果存在至少一个注释 AB 并且有单词“c”，则将它们全部注释为 C 模式如下所示。

# 1.
{  pattern: (/a/), action: (Annotate($0, name, "A")) }
{  pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
# 3.
{  pattern: (([name:AB]+) /c/), action: (Annotate($0, name, "C")) }

#1 和#2 有效，并且注释了“a b”：匹配的令牌：NamedEntitiesToken{word='a' name='AB' beginPosition=0 endPosition=1} 匹配的令牌：NamedEntitiesToken{word='b' name='AB' beginPosition=2 endPosition=3} 但是#3 模式不起作用，即使我们可以看到我们有 2 个“AB”注释标记，这正是#3 模式所期望的。如果我将 #1 更改为

{  pattern: (/a/), action: (Annotate($0, name, "AB")) }
{  pattern: (/b/), action: (Annotate($0, name, "AB")) }

模式 #3 正常工作：匹配的令牌：NamedEntitiesToken{word='a' name='C' beginPosition=0 endPosition=1} 匹配的令牌：NamedEntitiesToken{word='b' name='C' beginPosition=2 endPosition=3} 匹配的令牌：NamedEntitiesToken{word='c' name='C' beginPosition=4 endPosition=5}

我在使用时找不到匹配的标记之间的任何区别

# In this case #3 pattern works
{  pattern: (/a/), action: (Annotate($0, name, "AB")) }
{  pattern: (/b/), action: (Annotate($0, name, "AB")) }

或者当我使用

# In this case #3 pattern doesn't work
# 1.
{  pattern: (/a/), action: (Annotate($0, name, "A")) }
{  pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{  pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }

在这两种情况下，我都会得到相同的注释，但第一个场景有效，而第二个无效。我做错了什么？

【问题讨论】：

标签： stanford-nlp tokenize

【解决方案1】：

这对我有用：

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }

ENV.defaults["stage"] = 1

{ ruleType: "tokens", pattern: (/a/), action: Annotate($0, ner, "A") }
{ ruleType: "tokens", pattern: (/b/), action: Annotate($0, ner, "B") }

ENV.defaults["stage"] = 2

{ ruleType: "tokens", pattern: ([{ner: "A"}] [{ner: "B"}]), action: Annotate($0, ner, "AB") }

ENV.defaults["stage"] = 3

{ ruleType: "tokens", pattern: ([{ner: "AB"}]+ /c/), action: Annotate($0, ner, "ABC") }

这里有一篇关于 TokensRegex 的文章：

https://stanfordnlp.github.io/CoreNLP/tokensregex.html

【讨论】：

是的，只需添加 ENV.defaults["stage"] 即可解决问题。在规则＃3（在您的示例中注释 ABC 的规则）之前添加 ENV.defaults["stage"] = 2 就足够了。谢谢！