提升正则表达式以匹配 IF 语句答案

【问题标题】：boost regex to match IF statements提升正则表达式以匹配 IF 语句
【发布时间】：2020-08-26 20:34:57
【问题描述】：

我需要编写一个 boost 正则表达式来匹配以下字符串，并根据 IF 块的参数将其分成三个标记

=IF(ISNUMBER(SEARCH("Windows",GETWORKSPACE(1))),ON.TIME(NOW()+"00:00:02","abcdef"),CLOSE(TRUE))

理想情况下，这些应该来

token1 = "ISNUMBER(SEARCH("Windows",GETWORKSPACE(1)))"
token2 = "ON.TIME(NOW()+"00:00:02","abcdef")"
token3 = "CLOSE(TRUE)"

我最初编写了一个简单的正则表达式“(?),(.),(.*)(?=\))”因为贪婪的限定符占用了太多的第一个令牌，所以剔除不正确的令牌。我目前正在接受

token1 =     "ISNUMBER(SEARCH("Windows",GETWORKSPACE(1))),ON.TIME(NOW()+"00:00:02""
token2 =     ""abcdef")"
token3 =     "CLOSE(TRUE)"

还尝试了"(?<=\\=IF\\()([A-Za-z(),:\"]*?),([A-Za-z(),.:\"]*?),([A-Z(),:\"]*?)(?=\\))"，但没有成功。有人可以建议一个正则表达式吗？

【问题讨论】：

试试"(?<=\\(|,)\\w+(?:\\.\\w+)*(\\((?:[^()]++|(?1))*\\))"
为什么你认为正则表达式是这里的最佳选择？
@WiktorStribiżew：只要字符串不包含（不平衡）(
@Jarod42 或引用结构中的任何内容。见my take
@WiktorStribiżew 正则表达式似乎部分工作。然而，在第二种和第三种情况下，我是令牌的空值。我期待 token1 = "ISNUMBER(SEARCH("Windows",GETWORKSPACE(1)))" token2 = "ON.TIME(NOW()+"00:00:02","abcdef")" token3 = "CLOSE( TRUE)" 但是我得到 token1=(SEARCH("Windows",GETWORKSPACE(1))) token2="" token3 = ""

标签： c++ regex boost

【解决方案1】：

您需要一个简单的解析器。

这是我最喜欢的Boost swiss-army knife for quick parsers。

我创建了一个非常灵活的“标记”语法，它尊重（嵌套）括号和双引号字符串文字（可能带有嵌入的转义引号和括号）：

token = raw [ *(
      '(' >> -token_list >> ')'
    | '[' >> -token_list >> ']'
    | '{' >> -token_list >> '}'
    | string_literal
    | lexeme[ + ~char_(")]}([{\"',") ]
    ) ];

其中 token_list 和 string_literal 定义为

string_literal = lexeme [
    '"' >> *('\\' >> char_ | ~char_('"')) >> '"'
];

token_list = token % ',';

现在=IF(condition, true_part, false_part) 的解析器表达式很简单：

if_expr
    = '=' >> no_case["if"] 
    >> '(' >> token >> ',' >> token >> ',' >> token >> ')';

为了好玩，我将 IF 关键字设置为不区分大小写

演示

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_tuple.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;

namespace parser {
    using namespace x3;

    static rule<struct token_, std::string> const token = "token";

    static auto const string_literal = lexeme [
        '"' >> *('\\' >> char_ | ~char_('"')) >> '"'
    ];

    static auto const token_list = token % ',';

    static auto const token_def = raw [ *(
          '(' >> -token_list >> ')'
        | '[' >> -token_list >> ']'
        | '{' >> -token_list >> '}'
        | string_literal
        | +~char_(")]}([{\"',")  // glue together everything else
        ) ];

    BOOST_SPIRIT_DEFINE(token)

    static auto const if_expr
        = '=' >> no_case["if"] 
        >> '(' >> token >> ',' >> token >> ',' >> token >> ')';
}

int main() {
    for (std::string const& input : {
            R"(=IF(ISNUMBER,ON.TIME,CLOSE))",
            R"(=IF(ISNUMBER(SEARCH("Windows")),ON.TIME(NOW()+"00:00:02","abcdef"),CLOSE(TRUE)))",
            R"(=IF(ISNUMBER(SEARCH("Windows",GETWORKSPACE(1))),ON.TIME(NOW()+"00:00:02","abcdef"),CLOSE(TRUE)))",
            " = if( isnumber, on .time, close ) ",
            R"( = if( "foo, bar", if( isnumber, on .time, close ), IF("[ISN(UM}B\"ER")) )",
        })
    {
        auto f = input.begin(), l = input.end();
        std::cout << "=== " << std::quoted(input) << ":\n";

        std::string condition, true_part, false_part;
        auto attr = std::tie(condition, true_part, false_part);

        if (phrase_parse(f, l, parser::if_expr, x3::blank, attr)) {
            std::cout << "Parsed: \n"
               << " - condition: " << std::quoted(condition) << "\n"
               << " - true_part: " << std::quoted(true_part) << "\n"
               << " - false_part: " << std::quoted(false_part) << "\n";
        } else {
            std::cout << "Parse failed\n";
        }

        if (f!=l) {
            std::cout << "Remaining unparsed: " << std::quoted(std::string(f,l)) << "\n";
        }
    }
}

打印

=== "=IF(ISNUMBER,ON.TIME,CLOSE)":
Parsed: 
 - condition: "ISNUMBER"
 - true_part: "ON.TIME"
 - false_part: "CLOSE"
=== "=IF(ISNUMBER(SEARCH(\"Windows\")),ON.TIME(NOW()+\"00:00:02\",\"abcdef\"),CLOSE(TRUE))":
Parsed: 
 - condition: "ISNUMBER(SEARCH(\"Windows\"))"
 - true_part: "ON.TIME(NOW()+\"00:00:02\",\"abcdef\")"
 - false_part: "CLOSE(TRUE)"
=== "=IF(ISNUMBER(SEARCH(\"Windows\",GETWORKSPACE(1))),ON.TIME(NOW()+\"00:00:02\",\"abcdef\"),CLOSE(TRUE))":
Parsed: 
 - condition: "ISNUMBER(SEARCH(\"Windows\",GETWORKSPACE(1)))"
 - true_part: "ON.TIME(NOW()+\"00:00:02\",\"abcdef\")"
 - false_part: "CLOSE(TRUE)"
=== " = if( isnumber, on .time, close ) ":
Parsed: 
 - condition: "isnumber"
 - true_part: "on .time"
 - false_part: "close "
=== " = if( \"foo, bar\", if( isnumber, on .time, close ), IF(\"[ISN(UM}B\\\"ER\")) ":
Parsed: 
 - condition: "\"foo, bar\""
 - true_part: "if( isnumber, on .time, close )"
 - false_part: "IF(\"[ISN(UM}B\\\"ER\")"

【讨论】：

稍微简化并添加了更多边缘案例示例（突出显示不区分大小写、空格容忍、嵌套结构和嵌入（不平衡）的括号和字符串文字中的转义）。