qi % 运算符使用 (1) 分隔符属性和 (2) 接受尾随分隔符答案

【问题标题】：qi % operator consumes (1) delimiter attributes and (2) accepts a trailing delimiterqi % 运算符使用 (1) 分隔符属性和 (2) 接受尾随分隔符
【发布时间】：2016-05-28 06:04:18
【问题描述】：

四处寻找我的解析器的一些奇怪行为，我最终发现 qi % 的行为并不完全符合我的预期。

第一个问题：在冗长的文档中，a % b 被描述为 a >> *(b >> a) 的快捷方式。但实际上并非如此。仅当您接受要丢弃的 b 时，这才成立。

假设 simple_id 是任何解析器。那么其实

simple_id % lit(";")

和

一样

simple_id % some_sophisticated_attribute_emitting_parser_expression

因为右手边的表达式在任何情况下都将被丢弃（即不影响任何属性）。详细说明：第一个表达式的行为与（例如）完全相同：

simple_id % string(";")

因此，如果某些约束成立，则 string() 在语义上等价于 lit()，即两者都存在于 % 的 rh 操作数的域中。这是我的第一个问题：您认为这是一个错误吗？或者它是一个功能？我在邮件列表中讨论过这个问题，得到的答案是它是一个特性，因为这种行为被记录在案（如果你深入了解文档的细节）。如果你这样做，你会发现他们是对的。

我想成为这个库的用户。我发现在更高的语法水平上使用 qi 很容易。但是如果你深入到比特和字节以及迭代器的位置，生活就会变得艰难。有一次我决定不再信任，而是去追查qi代码。

我只用了几分钟就找到了 qi 内部的问题。一旦在屏幕上显示了负责的代码 (list.hpp)，对我来说很明显 qi % 还有另一个问题。这是qi %的确切语义

a % b <- a >> *(b >> a) >> -(b)

换句话说：它接受一个尾随 b（并使用它），即使它后面没有 a。这绝对没有记录。只是为了好玩，我研究了 % 的 X3 实现。该错误已被迁移并在那里发生。

这是我的问题：我的分析是否正确？如果是这样，您使用什么解析器库？你能推荐一个吗？如果我错了，我在哪里失败了？

我发布这些问题是因为我不是唯一一个苦苦挣扎的人。我希望这里提供的信息是有帮助的。

下面是一个独立的工作示例，演示了问题和这两个问题的解决方案。如果您运行该示例，请特别查看第二个测试。它显示了消耗尾随的百分比；（我认为不应该这样做）。

我的环境：MSVC 2015，目标：Win32 控制台，Boost 1.6.1

///////////////////////////////////////////////////////////////////////////
// This is a self-contained demo which compiles with MSVC 2015 to Win32
// console. Therefore it should compile with any modern compiler. :)
//
//
// This demo implements a new qi operator != which does the same as %
// does but without eating up the delimiters (unless they are non-output
// i.e. lit).
//
// The implementation also shows how to fix a bug which makes the current
// qi % operator eat a trailing b. The current implementation accepts 
// a >> *(b >> a) >> -(b).
//
//
// I utilize the not_equal_to proto::tag for the alternative % operation
// See the simple rules to compare both operators.
///////////////////////////////////////////////////////////////////////////

//#define BOOST_SPIRIT_DEBUG
#include <io.h>
#include <map>
#include <boost/spirit/repository/include/qi_confix.hpp>
#include <boost/spirit/include/qi.hpp>

// Change the result type to test containers etc.
// You may need to provide an << ostream operator to have output work
using result_type = std::string;

using iterator_type = std::string::const_iterator;

namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
namespace proto = boost::proto;

namespace maxence { namespace parser {
///////////////////////////////////////////////////////////////////////////////
//  The skipper grammar (just skip this section while reading ;)
///////////////////////////////////////////////////////////////////////////////
template <typename Iterator>
struct skipper : qi::grammar<Iterator>
{
    skipper() : skipper::base_type(start)
    {
        qi::char_type char_;
        using boost::spirit::eol;
        using boost::spirit::repository::confix;

        ascii::space_type space;

        start =
            space                               // tab/space/cr/lf
            | confix("/*", "*/")[*(char_ - "*/")] // C-style comments
            | confix("//", eol)[*(char_ - eol)] // C++-style comments
            ;
    }

    qi::rule<Iterator> start;
};
}}

namespace boost { namespace spirit {
        ///////////////////////////////////////////////////////////////////////////
        // Enablers
        ///////////////////////////////////////////////////////////////////////////
        template <>
        struct use_operator<qi::domain, proto::tag::not_equal_to> // enables p != d
            : mpl::true_ {};
}}
namespace ascii = boost::spirit::ascii;

namespace boost { namespace spirit { namespace qi
{
    template <typename Left, typename Right>
    struct list_ex : binary_parser<list_ex<Left, Right> >
    {
        typedef Left left_type;
        typedef Right right_type;

        template <typename Context, typename Iterator>
        struct attribute
        {
            // Build a std::vector from the LHS's attribute. Note
            // that build_std_vector may return unused_type if the
            // subject's attribute is an unused_type.
            typedef typename
                traits::build_std_vector<
                typename traits::
                attribute_of<Left, Context, Iterator>::type
                >::type
                type;
        };

        list_ex(Left const& left_, Right const& right_)
            : left(left_), right(right_) {}


/////////////////////////////////////////////////////////////////////////
// code from qi % operator
//
// Note: The original qi code accepts a >> *(b >> a) >> -(b)
//       That means a trailing delimiter gets consumed
//
//              template <typename F>
//              bool parse_container(F f) const
//              {
//                  // in order to succeed we need to match at least one element 
//                  if (f(left)) return false;
//                  typename F::iterator_type save = f.f.first;
//
//                  // The while clause below is wrong
//                  // To correct that (not eat trailing delimiters) it should read: 
//                  //  while (!(!right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused) && f(left)))
//                  
//                  while (right.parse(f.f.first, f.f.last, f.f.context, f.f.skipper, unused)   <--- issue!
//                      && !f(left))
//                  {
//                      save = f.f.first;
//                  }
// 
//                  f.f.first = save;
//              return true;
//
/////////////////////////////////////////////////////////////////////////


/////////////////////////////////////////////////////////////////////////
// replacement to allow operator not to "eat up" the "delimiter"
//
        template <typename F>
        bool parse_container(F f) const
        {
            // in order to succeed we need to match at least one element 
            if (f(left)) return false;

            while (!(f(right) && f(left)));

            return true;
        }
//
/////////////////////////////////////////////////////////////////////////

        template <typename Iterator, typename Context
            , typename Skipper, typename Attribute>
            bool parse(Iterator& first, Iterator const& last
                , Context& context, Skipper const& skipper
                , Attribute& attr_) const
        {
            typedef detail::fail_function<Iterator, Context, Skipper>
                fail_function;

            // ensure the attribute is actually a container type
            traits::make_container(attr_);

            Iterator iter = first;
            fail_function f(iter, last, context, skipper);
            if (!parse_container(detail::make_pass_container(f, attr_)))
                return false;

            first = f.first;
            return true;
        }

        template <typename Context>
        info what(Context& context) const
        {
            return info("list_ex",
                std::make_pair(left.what(context), right.what(context)));
        }

        Left left;
        Right right;
    };

    ///////////////////////////////////////////////////////////////////////////
    // Parser generators: make_xxx function (objects)
    ///////////////////////////////////////////////////////////////////////////
    template <typename Elements, typename Modifiers>
    struct make_composite<proto::tag::not_equal_to, Elements, Modifiers>
        : make_binary_composite<Elements, list_ex>
    {};
}}}

namespace boost {   namespace spirit {  namespace traits {
    ///////////////////////////////////////////////////////////////////////////
    template <typename Left, typename Right>
    struct has_semantic_action<qi::list_ex<Left, Right> >
        : binary_has_semantic_action<Left, Right> {};

    ///////////////////////////////////////////////////////////////////////////
    template <typename Left, typename Right, typename Attribute
        , typename Context, typename Iterator>
        struct handles_container<qi::list_ex<Left, Right>, Attribute, Context
        , Iterator>
        : mpl::true_ {};
}}}

using rule_type = qi::rule <iterator_type, result_type(), maxence::parser::skipper<iterator_type>>;

namespace maxence { namespace parser {

    template <typename Iterator>
    struct ident : qi::grammar < Iterator, result_type() , skipper<Iterator >>
    {
        ident();
        rule_type not_equal_to, modulus, not_used;
    };

    // we actually don't need the start rule (see below)
    template <typename Iterator>
    ident<Iterator>::ident() : ident::base_type(not_equal_to)
    {
        not_equal_to = (qi::alpha | '_') >> *(qi::alnum | '_') != qi::char_(";");
        modulus = (qi::alpha | '_') >> *(qi::alnum | '_') % qi::char_(";");
        modulus.name("qi modulus operator");

        BOOST_SPIRIT_DEBUG_NODES(
            (not_equal_to)
        )
    }
}}


int main()
{
    namespace parser = maxence::parser;

    using rule_map_type = std::map<std::string, rule_type&>;
    using rule_iterator_type = std::map<std::string, rule_type&>::const_iterator;
    using ss_map_type = std::map<std::string, std::string>;
    using ss_iterator_type = ss_map_type::const_iterator;


    parser::ident<iterator_type> ident;
    parser::skipper<iterator_type> skipper;

    ss_map_type parser_input =
    {
        { "; delimited list without trailing delimiter \n(expected result: success, EOI reached)", "willy; anton" },
        { "; delimited list with trailing delimiter \n(expected result: success, EOI not reached)", "willy; anton;" }
    };
    rule_map_type rules =
    {
        { "E1", ident.not_equal_to },
        { "E2", ident.modulus }
    };

    for (ss_iterator_type input = parser_input.begin(); input != parser_input.end(); input++) {
        for (rule_iterator_type example = rules.begin(); example != rules.end(); example++) {
            std::string to_parse = input->second;
            ::result_type result;
            std::string parser_name = (example->second).name();
            std::cout << "--------------------------------------------" << std::endl;
            std::cout << "Description: " << input->first << std::endl;
            std::cout << "Parser [" << parser_name << "] parsing [" << to_parse << "]" << std::endl;
            auto b(to_parse.begin()), e(to_parse.end());

            bool success = qi::phrase_parse(b, e, (example)->second, skipper, result);

            // --- test for parser success
            if (success) std::cout << "Parser succeeded. Result: " << result << std::endl;
            else std::cout << " Parser failed. " << std::endl;

            //--- test for EOI
            if (b == e) {
                std::cout << "EOI reached.";
            } else {
                std::cout << "Failure: EOI not reached. Remaining: [";
                while (b != e) std::cout << *b++; std::cout << "]";
            }
            std::cout << std::endl << "--------------------------------------------" << std::endl;
        }
    }
    return 0;
}

扩展：由于cmets，我扩展了我的帖子：

我的 != 运算符与 % 运算符不同。 != 运算符会将找到的所有“分隔符”添加到结果向量中。 (a != qi::char_(";,"))。将我的提议介绍给 % 会丢弃有用的功能。

也许有理由引入一个额外的运算符。我想我应该为此使用另一个运算符， != 伤害了我的眼睛。无论如何，!= 运算符也有很好的应用程序。例如：

settings_list = name != expression;

我认为 % 不吃尾随的“分隔符”是错误的。我上面的代码示例似乎证明了这一点。无论如何，我简化了这个例子，只关注这个问题。现在我知道失踪了；幸福地坐在加勒比海的某个地方，喝着凯匹林纳鸡尾酒。比被吃掉还好。 :)

下面的示例吃掉了尾随的“分隔符”，因为它并不是真正的尾随。问题是我的测试字符串。 Kleene 星在最后一个 ; 之后的匹配为零。因此它会被吃掉，这是正确的行为。

在这次“旅行”中，我学到了很多关于“气”的知识。不仅仅是来自文档。最重要的经验教训：仔细塑造你的测试用例。 A 不假思索地从某个示例中快速复制和粘贴。这就带来了问题。

#include <iostream>
#include <map>
#include <boost/spirit/include/qi.hpp>


namespace qi = boost::spirit::qi;
using iterator_type = std::string::const_iterator;
using result_type = std::string;

template <typename Parser>
void parse(const std::string message, const std::string& input, const Parser& parser)
{
    iterator_type iter = input.begin(), end = input.end();

    std::vector<result_type> parsed_result;

    std::cout << "-------------------------\n";
    std::cout << message << "\n";
    std::cout << "Parsing: \"" << input << "\"\n";

    bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
    if (result)
    {
        std::cout << "Parser succeeded.\n";
        std::cout << "Parsed " << parsed_result.size() << " elements:";
        for (const auto& str : parsed_result)
            std::cout << "[" << str << "]";
        std::cout << std::endl;
    }
    else
    {
        std::cout << "Something failed. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    if (iter == end) {
        std::cout << "EOI reached." << std::endl;
    }
    else {
        std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    std::cout << "-------------------------\n";

}

int main()
{

    auto r1 = (*(qi::alpha | '_')) % qi::char_(";");
    auto r2 = qi::as_string[*(qi::alpha | '_')] % qi::char_(";");

    parse("% eating the trailing delimiter 'delimiter'",  
        "willy; anton; 1234", r1);
    parse("% eating the trailing 'delimiter' (limited as_string edition)'",
        "willy; anton; 1234", r2);

    return 0;
}

【问题讨论】：

It does not accept a trailing delimiter.
Regarding something I think you didn't understand on the mailing list...
如果您在邮件列表中得到了积极的帮助，请至少链接到您的线程。 FWIW 我无法阅读您在邮件列表中发布的最后 7 个帖子的任何内容。也许您发布的格式有些问题（或者它只是我的 MUA，但它通常可以正常工作）
@jv_ ：关于示例。是的。哇。我年轻的时候更好。 :) 你是完全正确的。我提议的改变会破坏这个伟大的特性。 +1
@jv_：我终于明白了。在“尾随分隔符”之后有一个零匹配。所以它实际上并没有落后。谢谢你。我很抱歉。

标签： c++ boost boost-spirit boost-spirit-qi

【解决方案1】：

这里是所有问题的答案。

(1) 我的分析不正确。 % 运算符不吃尾随的“分隔符”。真正的问题是解析规则是一个 Kleene 星规则。此匹配规则在最后一个“分隔符”之后没有找到标识符，但匹配为零。所以 % 使用“分隔符”是完全可以的。

(2) 我目前不是在寻找 qi 替代品。

(3) % 的当前实现不会“丢弃” a % b 的 b。如果你真的有

simple_id % some_sophisticated_attribute_emitting_parser_expression

那么复杂的东西（可能是动态的（如 char_("+-*/")）必须匹配 % 才能继续。我提议的对 % 的更改会破坏此功能。

要让 %=（见下文）像 % 一样运行，您必须使用 (a %= qi::omit[b])。这几乎完全模仿了 a % b。不同之处在于 %= 故意吃掉“尾随分隔符”。下面的代码中有一个示例。因此 %= 不能作为 % 的超集。

如果 qi 应该由提供我要求的功能的操作员扩展，这是我不想宣传的讨论。关于解析器功能 qi 易于扩展，因此您可以根据自己的喜好生成其他解析器。

编译器对带有 auto 的 qi 2.x 过敏是另一个话题。更复杂。我从没想过，尤其是在我的 MSVC 2015 环境中，我永远不会崩溃。

不管怎样，让我这么愚蠢地坚持这么多，我欠你什么。下面的代码为 qi 提供了一个 %= 运算符 (modulus_assign) 的实现。它作为 list2 实现，位于 mxc::qitoo 命名空间中。如果有人发现它有价值并想使用它，我标记了标题的开始和结束。

主要功能是展示两个运算符之间的共同点和差异的展示案例。并再次表明 Kleene 明星是野生动物。

#include <iostream>
#include <map>


///////////////////////////
// start: header list2.hpp
///////////////////////////

#pragma once

#include <boost/spirit/include/qi.hpp>

namespace boost {
    namespace spirit {
        ///////////////////////////////////////////////////////////////////////////
        // Enablers
        ///////////////////////////////////////////////////////////////////////////
        template <>
        struct use_operator<qi::domain, proto::tag::modulus_assign> // enables p %= d
            : mpl::true_ {};
    }
}

namespace mxc {
    namespace qitoo {

        namespace spirit = boost::spirit;
        namespace qi = spirit::qi;

        template <typename Left, typename Right>
        struct list2 : qi::binary_parser<list2<Left, Right> >
        {
            typedef Left left_type;
            typedef Right right_type;

            template <typename Context, typename Iterator>
            struct attribute
            {
                // Build a std::vector from the LHS's and RHS's attribute. Note
                // that build_std_vector may return unused_type if the
                // subject's attribute is an unused_type.
                typedef typename
                    spirit::traits::build_std_vector<
                    typename spirit::traits::attribute_of<Left, Context, Iterator>::type>::type type;
            };

            list2(Left const& left_, Right const& right_) : left(left_), right(right_) {}

            template <typename F>
            bool parse_container(F f) const
            {
                typename F::iterator_type save = f.f.first;

                // we need a first left match at least
                if (f(left)) return false;

                // if right does not match rewind iterator and fail
                if (f(right)) {
                    f.f.first = save;
                    return false;
                }

                // easy going
                while (!f(left) && !f(right))
                {
                    save = f.f.first;
                }

                f.f.first = save;
                return true;
            }

            template <typename Iterator, typename Context, typename Skipper, typename Attribute>
            bool parse(Iterator& first, Iterator const& last, Context& context, Skipper const& skipper, Attribute& attr_) const
            {
                typedef qi::detail::fail_function<Iterator, Context, Skipper>
                    fail_function;

                // ensure the attribute is actually a container type
                spirit::traits::make_container(attr_);

                Iterator iter = first;
                fail_function f(iter, last, context, skipper);
                if (!parse_container(qi::detail::make_pass_container(f, attr_)))
                    return false;

                first = f.first;
                return true;
            }

            template <typename Context>
            qi::info what(Context& context) const
            {
                return qi::info("list2",
                    std::make_pair(left.what(context), right.what(context)));
            }

            Left left;
            Right right;
        };
    }
}

namespace boost {
    namespace spirit {
        namespace qi {
            ///////////////////////////////////////////////////////////////////////////
            // Parser generators: make_xxx function (objects)
            ///////////////////////////////////////////////////////////////////////////
            template <typename Elements, typename Modifiers>
            struct make_composite<proto::tag::modulus_assign, Elements, Modifiers>
                : make_binary_composite<Elements, mxc::qitoo::list2>
            {};
        }

        namespace traits
        {
            ///////////////////////////////////////////////////////////////////////////
            template <typename Left, typename Right>
            struct has_semantic_action<mxc::qitoo::list2<Left, Right> >
                : binary_has_semantic_action<Left, Right> {};

            ///////////////////////////////////////////////////////////////////////////
            template <typename Left, typename Right, typename Attribute
                , typename Context, typename Iterator>
                struct handles_container<mxc::qitoo::list2<Left, Right>, Attribute, Context
                , Iterator>
                : mpl::true_ {};
        }
    }
}
///////////////////////////
// end: header list2.hpp
///////////////////////////

namespace qi = boost::spirit::qi;
namespace qitoo = mxc::qitoo;

using iterator_type = std::string::const_iterator;
using result_type = std::string;

template <typename Parser>
void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser)
{
    iterator_type iter = input.begin(), end = input.end();

    std::vector<result_type> parsed_result;

    std::cout << "-------------------------\n";
    std::cout << message << "\n";
    std::cout << "Rule: " << rule << std::endl;
    std::cout << "Parsing: \"" << input << "\"\n";

    bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
    if (result)
    {
        std::cout << "Parser succeeded.\n";
        std::cout << "Parsed " << parsed_result.size() << " elements:";
        for (const auto& str : parsed_result)
            std::cout << "[" << str << "]";
        std::cout << std::endl;
    }
    else
    {
        std::cout << "Parser failed" << std::endl;
    }
    if (iter == end) {
        std::cout << "EOI reached." << std::endl;
    }
    else {
        std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    std::cout << "-------------------------\n";

}

int main()
{
    parse("Modulus-Assign Operator (%), list with several different 'delimiters'  "
        , "willy; anton; frank, joel, 1234"
        , "(+(qi::alpha | qi::char_('_'))) % qi::char_(\";,\"))"
        , (+(qi::alpha | qi::char_('_'))) % qi::char_(";,"));

    parse("Modulus-Assign Operator (%=), list with several different 'delimiters'  "
        , "willy; anton; frank, joel, 1234"
        , "(+(qi::alpha | qi::char_('_'))) %= qi::char_(\";,\"))"
        , (+(qi::alpha | qi::char_('_'))) %= qi::char_(";,"));

    parse("Modulus-Assign Operator (%), list with several different 'delimiters'  "
        , "willy; anton; frank, joel, 1234"
        , "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(\";,\"))"
        , ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) % qi::char_(";,"));

    parse("Modulus-Assign Operator (%=), list with several different 'delimiters'  "
        , "willy; anton; frank, joel, 1234"
        , "((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(\";,\"))"
        , ((qi::alpha | qi::char_('_')) >> *(qi::alnum | '_')) %= qi::char_(";,"));

    std::cout << std::endl << "Note that %= exposes the trailing 'delimiter' and it has to to enable this usage:" << std::endl;

    parse("Modulus-Assign Operator (%=), list with several different 'delimiters'\n using omit to mimic %"
        , "willy; anton; frank, joel, 1234"
        , "+(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(\";,\"))]"
        , +(qi::alpha | qi::char_('_')) %= qi::omit[qi::char_(";,")]);

    parse("Modulus Operator (%), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
        , "x = 5; y = 7; z = 10; = 7;"
        , "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
        , *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));

    parse("Modulus-Assign Operator (%=), list of assignments (*bio hazard edition*)\nBe careful with the Kleene star, Eugene!"
        , "x = 5; y = 7; z = 10; = 7;"
        , "*(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
        , *(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));

    parse("Modulus-Assign Operator (%=), list of assignments (x = digits;)\nBe careful with the Kleene star, Eugene!"
        , "x = 5; y = 7; z = 10; = 7;"
        , "+(qi::alpha | qi::char_('_')) %= (qi::lit(\"=\") >> +qi::digit >> qi::lit(';')))"
        , +(qi::alpha | qi::char_('_')) %= (qi::lit("=") >> +qi::digit >> qi::lit(';')));
    return 0;
}

【讨论】：

您已经对此进行了大量研究......很好的答案。只是最后一个little problem。
谢谢。喔好吧。 '' 产生 lit('')。所以我很高兴解析字符串的示例不包含“_”。所以没有人被吃掉。 :)
修正了示例规则。