【问题标题】:ANTLR: lexing bash files, especially heredocANTLR:对 bash 文件进行词法分析,尤其是 heredoc
【发布时间】:2021-05-30 01:35:17
【问题描述】:

我正在使用 ANTLR 来处理 bash 文件(用于语法着色)。是否可以使用具有动态结尾的 lex 规则,例如 heredoc:

cat <<ENDTEXT
hello world, 
this text may contain 
any letters, even ' and "
ENDTEXT

cat <<FOO
here a different end-word
is used
FOO

【问题讨论】:

    标签: antlr antlr4 heredoc


    【解决方案1】:

    只有predicate 才有可能。

    这是一个简单的例子:

    lexer grammar BashLexer;
    
    @members {
      private boolean heredocEndAhead(String partialHeredoc) {
        if (this.getCharPositionInLine() != 0) {
          // If the lexer is not at the start of a line, no end-delimiter can be possible
          return false;
        }
    
        // Get the delimiter
        String firstLine = partialHeredoc.split("\r?\n|\r")[0];
        String delimiter = firstLine.replaceAll("^<<-?\\s*", "");
    
        for (int n = 1; n < delimiter.length(); n++) {
          if (this._input.LA(n) != delimiter.charAt(n - 1)) {
            return false;
          }
        }
    
        // If we get to this point, we know there is an end delimiter ahead in the char stream, make
        // sure it is followed by a white space (or the EOF). If we don't do this, then "FOOS" would also
        // be considered the end for the delimiter "FOO"
        int charAfterDelimiter = this._input.LA(delimiter.length() + 1);
    
        return charAfterDelimiter == EOF ||  Character.isWhitespace(charAfterDelimiter);
      }
    }
    
    HEREDOC
     : '<<' '-'? [ \t]* [a-zA-Z_] [a-zA-Z_0-9]* NL ( {!heredocEndAhead(getText())}? . )* [a-zA-Z_] [a-zA-Z_0-9]*
     ;
    
    ANY
     : .
     ;
    
    fragment NL
     : '\r'? '\n'
     | '\r'
     ;
    

    这将对输入进行标记:

    cat <<ENDTEXT
    hello world, 
    ENDTEXTS ENDTEXT
    this text may contain 
    any letters, even ' and "
    ENDTEXT
    

    像这样:

    ANY      `c`
    ANY      `a`
    ANY      `t`
    ANY      ` `
    HEREDOC  `<<ENDTEXT\nhello world, \nENDTEXTS ENDTEXT\nthis text may contain \nany letters, even ' and "\nENDTEXT`
    EOF      `<EOF>`
    

    【讨论】:

    • 我想,如果 here doc start 模式后面跟着一些其他需要解析为单独元素的重定向,它会变得更加混乱:cat &lt;&lt; ENDTEXT &gt; file.txt
    • 是的,没错。并且不要忘记可选引号:cat &lt;&lt; 'ENDTEXT' &gt; file.txt
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-01-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多