Antlr4 丢弃剩余的令牌而不是纾困答案

【问题标题】：Antlr4 discards remaining tokens instead of bailing outAntlr4 丢弃剩余的令牌而不是纾困
【发布时间】：2013-02-28 03:49:41
【问题描述】：

我用的是Antlr4，下面是我写的一个简化语法：

grammar BooleanExpression;

/*******************************
 *      Parser Rules
 *******************************/
booleanTerm
    : booleanLiteral (KW_OR booleanLiteral)+
    | booleanLiteral
    ;

id
    : IDENTIFIER
    ;

booleanLiteral
    : KW_TRUE
    | KW_FALSE
    ;

/*******************************
 *         Lexer Rules
 *******************************/
KW_TRUE
    : 'true'
    ;

KW_FALSE
    : 'false'
    ;

KW_OR
    : 'or'
    ;   

IDENTIFIER
    : (SIMPLE_LATIN)+
    ;

fragment 
SIMPLE_LATIN
    : 'A' .. 'Z'
    | 'a' .. 'z'
    ;

WHITESPACE
    : [ \t\n\r]+ -> skip
    ;

我使用了 BailErrorStategy 和 BailLexer，如下所示：

public class BailErrorStrategy extends DefaultErrorStrategy {
    /**
     * Instead of recovering from exception e, rethrow it wrapped in a generic
     * IllegalArgumentException so it is not caught by the rule function catches.
     * Exception e is the "cause" of the IllegalArgumentException.
     */

    @Override
    public void recover(Parser recognizer, RecognitionException e) {
        throw new IllegalArgumentException(e);
    }

    /**
     * Make sure we don't attempt to recover inline; if the parser successfully
     * recovers, it won't throw an exception.
     */
    @Override
    public Token recoverInline(Parser recognizer) throws RecognitionException {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }

    /** Make sure we don't attempt to recover from problems in subrules. */
    @Override
    public void sync(Parser recognizer) {
    }

    @Override
    protected Token getMissingSymbol(Parser recognizer) {
        throw new IllegalArgumentException(new InputMismatchException(recognizer));
    }
}



 public class BailLexer extends BooleanExpressionLexer {
    public BailLexer(CharStream input) {
        super(input);
        //removeErrorListeners();
        //addErrorListener(new ConsoleErrorListener());
    }

    @Override
    public void recover(LexerNoViableAltException e) {
        throw new IllegalArgumentException(e); // Bail out
    }

    @Override
    public void recover(RecognitionException re) {
        throw new IllegalArgumentException(re); // Bail out
    }
}

除了一种情况外，一切正常。我尝试了以下表达式：

true OR false

我希望这个表达式被拒绝并抛出 IllegalArgumentException，因为“或”标记应该是小写而不是大写。但事实证明 Antlr4 并没有拒绝这个表达式，并且该表达式被标记为“KW_TRUE IDENTIFIER KW_FALSE”（这是预期的，大写的“OR”将被视为一个 IDENTIFIER），但是解析器在执行过程中没有抛出错误处理此令牌流并将其解析为仅包含“true”的树并丢弃剩余的“IDENTIFIER KW_FALSE”令牌。我尝试了不同的预测模式，但它们都像上面一样工作。我不知道为什么它会这样工作并进行了一些调试，最终导致了 Antlr 中的这段代码：

ATNConfigSet reach = computeReachSet(previous, t, false);

if ( reach==null ) {
    // if any configs in previous dipped into outer context, that
    // means that input up to t actually finished entry rule
    // at least for SLL decision. Full LL doesn't dip into outer
    // so don't need special case.
    // We will get an error no matter what so delay until after
    // decision; better error message. Also, no reachable target
    // ATN states in SLL implies LL will also get nowhere.
    // If conflict in states that dip out, choose min since we
    // will get error no matter what.
    int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
    if ( alt!=ATN.INVALID_ALT_NUMBER ) {
        // return w/o altering DFA
        return alt;
    }
    throw noViableAlt(input, outerContext, previous, startIndex);
}

代码“int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);”返回 booleanTerm 中的第二种选择（因为“true”与第二种选择“booleanLiteral”匹配）但由于它不等于 ATN.INVALID_ALT_NUMBER，因此不会立即抛出 noViableAlt。根据那里的Java cmets，“无论如何我们都会得到一个错误，所以延迟到决定之后”，但似乎最终没有抛出错误。

我真的不知道如何让 Antlr 在这种情况下报告错误，有人可以帮我解释一下吗？任何帮助表示赞赏，谢谢。

【问题讨论】：

也许不是所有的代币都被消耗掉了？如果你强制解析器一直解析到输入结束会发生什么：parse : booleanTerm EOF;
你为什么不使用BailErrorStrategy？

标签： error-handling antlr4

【解决方案1】：

如果您的顶级规则不以显式 EOF 结尾，则不需要 ANTLR 解析到输入序列的末尾。它不会抛出异常，而是简单地解析你给它的序列的有效部分。

以下start 规则将强制它将整个输入序列解析为单个booleanTerm。

start : booleanTerm EOF;

另外，BailErrorStrategy 是由 ANTLR 4 运行时提供的，它会抛出比您的示例中显示的更丰富的 ParseCancellationException。

【讨论】：

谢谢一百万。这确实是我遇到的问题的解决方案，我进行了更多搜索并找到了 Antlr 3 的这个 wiki，antlr.org/wiki/pages/viewpage.action?pageId=4554943，它描述了完全相同的问题。
我认为问题在于确实没有官方文档对此进行描述。我阅读了 Antlr 4 在线文档（不是那么多）和 Definitive ANTLR 4 Reference book，但我不记得我读过任何提到使用 'EOF' 令牌的内容，例如这里。 Definitive ANTLR 4 Reference 中没有任何示例在开始规则的末尾有一个 EOF，并且在“错误报告和恢复”部分中也没有提到它:(
我没有意识到我可以使用内置的 BailErrorStrategy，感谢您指出这一点。我会试试的。
@280Z28 嘿，我遇到了同样的问题。但我的问题是，有时我需要解析一个子规则（不是开始规则），只输入目标子规则的内容。解析器还会丢弃剩余的标记。我该如何解决这个问题？因为不可能为所有子规则添加EOF。
我想我可能遇到与@Stoneboy 相同的问题。 stackoverflow.com/questions/29834489/…