【问题标题】:Compiling a simple parser with Boost.Spirit用 Boost.Spirit 编译一个简单的解析器
【发布时间】:2012-02-22 22:57:12
【问题描述】:

我正在开发的一个简单的骨架实用程序的一部分,我有一个用于触发文本替换的语法。我认为这是一种熟悉 Boost.Spirit 的好方法,但模板错误是一种独特的乐趣。

这里是完整的代码:

#include <iostream>
#include <iterator>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace bsq = boost::spirit::qi;

namespace {
template<typename Iterator>
struct skel_grammar : public bsq::grammar<Iterator> {
    skel_grammar();

private:
    bsq::rule<Iterator> macro_b;
    bsq::rule<Iterator> macro_e;
    bsq::rule<Iterator, bsq::ascii::space_type> id;
    bsq::rule<Iterator> macro;
    bsq::rule<Iterator> text;
    bsq::rule<Iterator> start;
};

template<typename Iterator>
skel_grammar<Iterator>::skel_grammar() : skel_grammar::base_type(start)
{
    text = bsq::no_skip[+(bsq::char_ - macro_b)[bsq::_val += bsq::_1]];
    macro_b = bsq::lit("<<");
    macro_e = bsq::lit(">>");
    macro %= macro_b >> id >> macro_e;
    id %= -(bsq::ascii::alpha | bsq::char_('_'))
        >> +(bsq::ascii::alnum | bsq::char_('_'));
    start = *(text | macro);
}
}  // namespace

int main(int argc, char* argv[])
{
    std::string input((std::istreambuf_iterator<char>(std::cin)),
                      std::istreambuf_iterator<char>());
    skel_grammar<std::string::iterator> grammar;
    bool r = bsq::parse(input.begin(), input.end(), grammar);
    std::cout << std::boolalpha << r << '\n';
    return 0;
}

这段代码有什么问题?

【问题讨论】:

    标签: c++ boost-spirit


    【解决方案1】:

    嗯。我觉得我们在聊天中讨论的细节比问题中反映的要多。

    让我用我的“玩具”实现来娱乐你,包括测试用例,语法可以识别 &lt;&lt;macros&gt;&gt; 这样的,包括相同的嵌套扩展。

    显着特点:

    1. 使用回调 (process()) 完成扩展,为您提供最大的灵活性(您可以使用查找表,根据宏内容导致解析失败,甚至产生与输出无关的副作用
    2. 解析器经过优化以支持流模式。查看spirit::istream_iterator,了解如何在流模式下解析输入(Stream-based Parsing Made Easy)。如果您的输入流为 10 GB,并且仅包含 4 个宏,这将具有明显的好处 - 这是抓取性能(或内存不足)与仅缩放之间的区别。
      • 请注意,演示仍然写入字符串缓冲区(通过oss)。但是,您可以轻松地将输出直接挂接到 std::coutstd::ofstream 实例
    3. 急切地完成了扩展,因此您可以使用间接宏获得漂亮的效果。查看测试用例
    4. 我什至演示了一种简单的方法来支持 转义 &lt;&lt;&gt;&gt; 分隔符 (#define SUPPORT_ESCAPES)

    事不宜迟:

    代码

    注意 由于懒惰,我需要-std==c++0x,但SUPPORT_ESCAPES被定义时

    //#define BOOST_SPIRIT_DEBUG
    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/phoenix.hpp>
    
    namespace qi = boost::spirit::qi;
    namespace phx= boost::phoenix;
    namespace fsn= boost::fusion;
    
    namespace
    {
        #define SUPPORT_ESCAPES
    
        static bool process(std::string& macro)
        {
            if (macro == "error") {
                return false; // fail the parse
            }
    
            if (macro == "hello") {
                macro = "bye";
            } else if (macro == "bye") {
                macro = "We meet again";
            } else if (macro == "sideeffect") {
                std::cerr << "this is a side effect while parsing\n";
                macro = "(done)";
            } else if (std::string::npos != macro.find('~')) {  
                std::reverse(macro.begin(), macro.end());
                macro.erase(std::remove(macro.begin(), macro.end(), '~'));
            } else {
                macro = std::string("<<") + macro + ">>"; // this makes the unsupported macros appear unchanged
            }
    
            return true;
        }
    
        template<typename Iterator, typename OutIt>
            struct skel_grammar : public qi::grammar<Iterator>
        {
            struct fastfwd {
                template<typename,typename> struct result { typedef bool type; };
    
                template<typename R, typename O> 
                    bool operator()(const R&r,O& o) const
                {
    #ifndef SUPPORT_ESCAPES
                    o = std::copy(r.begin(),r.end(),o);
    #else
                    auto f = std::begin(r), l = std::end(r);
                    while(f!=l)
                    {
                        if (('\\'==*f) && (l == ++f))
                            break;
                        *o++ = *f++;
                    }
    #endif
                    return true; // false to fail the parse
                }
            } copy;
    
            skel_grammar(OutIt& out) : skel_grammar::base_type(start)
            {
                using namespace qi;
    
    #ifdef SUPPORT_ESCAPES
                rawch = ('\\' >> char_) | char_;
    #else
    #           define rawch qi::char_
    #endif
    
                macro = ("<<" >> (
                               (*(rawch - ">>" - "<<") [ _val += _1 ]) 
                             % macro                   [ _val += _1 ] // allow nests
                          ) >> 
                          ">>")  
                    [ _pass = phx::bind(process, _val) ];
    
                start = 
                    raw [ +(rawch - "<<") ] [ _pass = phx::bind(copy, _1, phx::ref(out)) ] 
                  % macro                   [ _pass = phx::bind(copy, _1, phx::ref(out)) ]
                  ;
    
                BOOST_SPIRIT_DEBUG_NODE(start);
                BOOST_SPIRIT_DEBUG_NODE(macro);
    
    
    #           undef rawch
            }
    
            private:
    #ifdef SUPPORT_ESCAPES
            qi::rule<Iterator, char()> rawch;
    #endif
            qi::rule<Iterator, std::string()> macro;
            qi::rule<Iterator> start;
        };
    }
    
    int main(int argc, char* argv[])
    {
        std::string input = 
            "Greeting is <<hello>> world!\n"
            "Side effects are <<sideeffect>> and <<other>> vars are untouched\n"
            "Empty <<>> macros are ok, as are stray '>>' pairs.\n"
            "<<nested <<macros>> (<<hello>>?) work>>\n"
            "The order of expansion (evaluation) is _eager_: '<<<<hello>>>>' will expand to the same as '<<bye>>'\n"
            "Lastly you can do algorithmic stuff too: <<!esrever ~ni <<hello>>>>\n"
    #ifdef SUPPORT_ESCAPES // bonus: escapes
            "You can escape \\<<hello>> (not expanded to '<<hello>>')\n"
            "Demonstrate how it <<avoids <\\<nesting\\>> macros>>.\n"
    #endif
            ;
    
        std::ostringstream oss;
        std::ostream_iterator<char> out(oss);
    
        skel_grammar<std::string::iterator, std::ostream_iterator<char> > grammar(out);
    
        std::string::iterator f(input.begin()), l(input.end());
        bool r = qi::parse(f, l, grammar);
    
        std::cout << "parse result: " << (r?"success":"failure") << "\n";
        if (f!=l)
            std::cout << "unparsed remaining: '" << std::string(f,l) << "'\n";
    
        std::cout << "Streamed output:\n\n" << oss.str() << '\n';
    
        return 0;
    }
    

    测试输出

    this is a side effect while parsing
    parse result: success
    Streamed output:
    
    Greeting is bye world!
    Side effects are (done) and <<other>> vars are untouched
    Empty <<>> macros are ok, as are stray '>>' pairs.
    <<nested <<macros>> (bye?) work>>
    The order of expansion (evaluation) is _eager_: 'We meet again' will expand to the same as 'We meet again'
    Lastly you can do algorithmic stuff too: eyb in reverse!
    You can escape <<hello>> (not expanded to 'bye')
    Demonstrate how it <<avoids <<nesting>> macros>>.
    

    那里隐藏着很多功能。我建议你看看测试用例和the process() callback 并排看看发生了什么。

    干杯和HTH :)

    【讨论】:

    • 啊,如果你#define rawch,至少undef 之后。
    • @KonradRudolph, really...?哦,好吧,我想你是正确的,它确实很重要。固定的。感谢您仔细阅读:)
    • :p 老实说,你的回答对我来说就像一个功能完整的代码。如果我问了这个问题,我可能只是将它复制并粘贴到我的代码中(当然减去process 函数)。也就是说,代码提出了一个问题:规则分配实际上有多低效?也就是说,#defineing 而不是在这里分配实际上会带来实实在在的好处吗?不幸的是,我怀疑答案是“是”……
    • @KonradRudolph:取决于您使用它的目的,但实际上,可能是的。我正在定义它,因此任何复制/粘贴它的人都会根据自己的喜好手动进行预处理,而不会得到次优的代码。在某种程度上过早的优化......但适合 SO,我认为
    猜你喜欢
    • 2011-06-02
    • 2012-10-29
    • 1970-01-01
    • 2013-12-09
    • 1970-01-01
    • 2010-09-28
    • 1970-01-01
    • 1970-01-01
    • 2014-03-24
    相关资源
    最近更新 更多