使用替代运算符“|”提升精神失败！当有两个可能的规则时答案

【问题标题】：boost spirit with alternative operator '|' Fail! when there are two possibles rules to go使用替代运算符“|”提升精神失败！当有两个可能的规则时
【发布时间】：2011-09-18 03:35:20
【问题描述】：

我正在开发一个 http 解析器。当我尝试使用替代运算符进行解析时，它发现了一个问题。我可以使用hold []修复它们与属性中的值无关。当开头有两条相似的规则时，就会出现问题规则。这里有一些简单的规则来证明我的问题；

qi::rule<string_iterator> some_rule(
        (char_('/') >> *char_("0-9")) /*first rule accept  /123..*/
      | (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
    );

然后我使用qi::parse 解析这条规则，如果输入字符串喜欢，它将失败； "/abcd"

但是，当我在第一条规则之前切换第二条规则时。解析器将返回 true 我认为问题是因为当解析器使用第一条规则使用输入时然后它发现第一个规则是失败。它不会回到第二条规则，即第一条规则的替代方案。

我尝试将hold[] 放在第一条规则中，但这仅有助于生成属性。它不能解决这个问题。我不知道如何解决这个问题，因为 HTTP 有很多规则开头的规则与其他规则相同。

===========关于我的代码的更多信息=============================
这是我解析字符串的函数

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using namespace rule;
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}

我主要有这段代码；

std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                        | rule_w_question
                                                       );
parse_to_string(str, whatever_rule, result);

我得到了这个结果；

[不正确]/htmlquery? [dead with]/htmlquery

但是当我像这样切换规则时；（我把“rule_w_question”放在“rule_wo_question”之前）

std::string result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
                                                            | rule_wo_question
                                                           );
    parse_to_string(str, whatever_rule, result);

输出将是； [正确]/htmlquery?

第一个版本（错误的）似乎解析消耗'/htmlquery'（“rule_wo_question”），然后它发现它不能消耗'？这使这条规则失效。那么这个规则就不能转到替代规则 ("rule_w_question") 。最后程序返回“[不正确]”

第二个版本我在“rule_wo_question”之前切换了“rule_w_question”。这就是解析器返回“[正确]”的原因。

================================================ ================ 我与 pthread 和 boost_filesystem 链接的 boost 1.47 的整个代码这是我的主要.c

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>

using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}





int main()
{
    std::string str, result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                           | rule_w_question
                                                           );
    parse_to_string(str, whatever_rule, result);
    return 0;
}

结果是

[incorrect]/htmlquery?

[dead with]/htmlquery

【问题讨论】：

你能把你的语法贴成EBNF格式吗？它有助于发现任何错误以及确定您提出的语法是否得到 Spirit 的支持（即 LL(0)）。
您上面的代码 sn-p 看起来没问题，因此还有其他问题。请发布一个最小的独立代码示例来暴露您的问题。
谁能帮助我。我发布了我的代码的完整版本。我真的需要帮助！非常感谢

标签： c++ boost boost-spirit-qi

【解决方案1】：

Spirit 按照指定的顺序尝试给定的替代方案，并在匹配第一个后停止解析。不执行穷举匹配。如果一个替代匹配它停止寻找。 IOW，备选方案的顺序很重要。您应该始终首先列出“最长”的替代品。

【讨论】：

最长的（"aaab" | "aab" | "aa"）或最具体的（'?' | +char_）
很好的问题，我必须承认没有直接的答案。大多数情况下，“最具体”似乎是正确的，但在其他情况下，“最长匹配”似乎是正确的。要在任何特定情况下回答这个问题，您可能需要记住在 Spirit 中是如何评估替代方案的。它总是自上而下评估备选方案，在第一个匹配的备选方案处停止。
我的评论实际上是一种改进 - 问号可能在潜意识中起作用。事实上，就像在所有常规自动机中一样，分支排序很重要。
来自我的例子。当字符串“/htmlquery？”使用规则“'/' >> *char_("az") 进行解析，这使得该规则“失败”。但是从您的回答来看，解析应该使用替代规则 (/ >> *char_("az") >> '?') 但它没有。最后解析器为整个系统返回 false。我不知道为什么解析器不去替代规则，因为第一个规则是失败的。谢谢
正如我上面所说，我需要查看整个示例以找出失败的原因 - 最好是重现您所看到的问题的最小示例。

【解决方案2】：

你为什么不这样做？

some_rule(
     char_('/')
     >> (
         *char_("0-9")  /\*first rule accept /123..\*/
       | *char_("a-z") /\*second rule accept/abc..\*/
     )
);

编辑：实际上这将匹配 / 后跟空（“0-9”0 次）并且不会费心寻找“az”，将 * 更改为 + .

【讨论】：

【解决方案3】：

qi::rule<string_iterator> some_rule(
    (char_('/') >> *char_("0-9")) >> qi::eol /*first rule accept  /123..*/
  | (char_('/') >> *char_("a-z")) >> qi::eol /*second rule accept /abc..*/
);

您可以使用 ',' 或其他一些终止符来代替 eol。问题是 char_('/') >> *char_("0-9")) 匹配 '/' 后跟 0 个或多个数字。所以“/abcd”匹配“/”然后停止解析。 K-ballo 的解决方案是我处理这种情况的方式，但此解决方案是作为替代方案提供的，以防（出于某种原因）他的方案不可接受。

【讨论】：

是的。我尝试了您的解决方案，但没有奏效。我测试了 rule_w_question 和 rule_wo_question （你可以从我刚刚发布的问题的最后一部分看到），将 qi::eol 附加到规则的末尾。当我使用“Kleen star”和已经消耗了输入的某些部分的规则时，解析器似乎不会调用任何替代规则。谢谢

【解决方案4】：

因为你的第一条规则有一个匹配，而Spirit是贪婪的。

(char_('/') >> *char_("0-9"))

在这条规则中输入“/abcd”会产生以下逻辑：

"/abcd" -> '/' 是下一个字符吗？是的。子规则匹配。 -> "abcd" 仍然存在。
"abcd" -> 是否有 0 个或多个数字？是的。有0位数。子规则匹配。 -> "abcd" 仍然存在。
替代 ('|') 语句的第一个子句匹配；跳过剩余的替代条款。 -> "abcd" 仍然存在。
规则匹配剩余“abcd”。这可能不会解析并导致您的失败。

您可以考虑将“*”（表示“0 或更多”）更改为“+”（表示“1 或更多”）。

【讨论】：