【问题标题】:Boost spirit revert parsingBoost Spirit 还原解析
【发布时间】:2014-05-30 11:43:54
【问题描述】:

我想解析一个包含以下结构的文件:

some
garbage *&%
section1 {
    section_content
}
section2 {
    section_content
}

解析section_name1 { ... } section_name2 { ... }的规则已经定义:

section_name_rule = lexeme[+char_("A-Za-z0-9_")];
section = section_name_rule > lit("{") > /*some complicated things*/... > lit("}");
sections %= +section;

所以我需要跳过任何垃圾,直到满足sections 规则。 有没有办法做到这一点?我试过seek[sections],但是好像不行。

编辑: 我本地化了 seek 不起作用的原因:如果我使用 follow 运算符(>>),那么它可以工作。如果使用了期望解析器(>),那么它会抛出异常。这是一个示例代码:

#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
using boost::phoenix::push_back;

struct section_t {
    std::string name, contents;
    friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; }
};

BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents))

    typedef std::vector<section_t> sections_t;

    template <typename It, typename Skipper = qi::space_type>
    struct grammar : qi::grammar<It, sections_t(), Skipper>
{
    grammar() : grammar::base_type(start) {
        using namespace qi;
        using boost::spirit::repository::qi::seek;
        section_name_rule = lexeme[+char_("A-Za-z0-9_")];
        //Replacing '>>'s with '>'s throws an exception, while this works as expected!!
        section = section_name_rule
            >>
            lit("{") >> lexeme[*~char_('}')] >> lit("}");
        start = seek [ hold[section[push_back(qi::_val, qi::_1)]] ]
            >> *(section[push_back(qi::_val, qi::_1)]);
    }
    private:
    qi::rule<It, sections_t(),  Skipper> start;
    qi::rule<It, section_t(),   Skipper> section;
    qi::rule<It, std::string(), Skipper> section_name_rule;
};

int main() {
    typedef std::string::const_iterator iter;
    std::string storage("sdfsdf\n sd:fgdfg section1 {dummy } section2 {dummy  } section3 {dummy  }");
    iter f(storage.begin()), l(storage.end());
    sections_t sections;
    if (qi::phrase_parse(f, l, grammar<iter>(), qi::space, sections))
    {
        for(auto& s : sections)
            std::cout << "Parsed: " << s << "\n";
    }
    if (f != l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

所以在实际示例中,我的整个语法都是由期望运算符构成的。我是否必须更改所有内容才能使“查找”起作用,或者有其他方法吗(比如说,查找一个简单的“{”,然后将一个 section_name_rule 还原)??

【问题讨论】:

  • 那么也许您应该显示不起作用的代码...

标签: c++ boost boost-spirit boost-spirit-qi boost-phoenix


【解决方案1】:

这是一个使用 Hamlet 获得灵感的演示:Live On Coliru

start = *seek [ no_skip[eol] >> hold [section] ];

注意事项:

  • 降低期望值
  • 通过在节名之前要求行首进行优化

示例输入:

some
garbage *&%
section1 {
   Claudius: ...But now, my cousin Hamlet, and my son —
   Hamlet: A little more than kin, and less than kind.
}
WE CAN DO MOAR GARBAGE
section2 {
   Claudius: How is it that the clouds still hang on you?
   Hamlet: Not so my lord; I am too much i' the sun 
}

输出:

Parsed: section_t[section1] {Claudius: ...But now, my cousin Hamlet, and my son —
   Hamlet: A little more than kin, and less than kind.
}
Parsed: section_t[section2] {Claudius: How is it that the clouds still hang on you?
   Hamlet: Not so my lord; I am too much i' the sun 
}

参考清单

// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>

namespace qi = boost::spirit::qi;

struct section_t { 
    std::string name, contents;
    friend std::ostream& operator<<(std::ostream& os, section_t const& s) { return os << "section_t[" << s.name << "] {" << s.contents << "}"; }
};

BOOST_FUSION_ADAPT_STRUCT(section_t, (std::string, name)(std::string, contents))

typedef std::vector<section_t> sections_t;

template <typename It, typename Skipper = qi::space_type>
struct grammar : qi::grammar<It, sections_t(), Skipper>
{
    grammar() : grammar::base_type(start) {
        using namespace qi;
        using boost::spirit::repository::qi::seek;

        section_name_rule = lexeme[+char_("A-Za-z0-9_")];
        section           = section_name_rule >> '{' >> lexeme[*~char_('}')] >> '}';
        start             = *seek [ no_skip[eol] >> hold [section] ];

        BOOST_SPIRIT_DEBUG_NODES((start)(section)(section_name_rule))
    }
  private:
    qi::rule<It, sections_t(),  Skipper> start;
    qi::rule<It, section_t(),   Skipper> section;
    qi::rule<It, std::string(), Skipper> section_name_rule;
};

int main() {
    using It = boost::spirit::istream_iterator;
    It f(std::cin >> std::noskipws), l;

    sections_t sections;
    if (qi::phrase_parse(f, l, grammar<It>(), qi::space, sections))
    {
        for(auto& s : sections)
            std::cout << "Parsed: " << s << "\n";
    }
    if (f != l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

【讨论】:

  • 感谢您的示例。我使用它并相应地编辑了问题。
  • 是的,您不能使用期望点,因为这样您就无法检测到模式匹配的位置,因为会抛出不匹配。您可以使用on_error 来覆盖此行为(有效地“捕获”expectation_failure),但这可能不会产生高效的语法。重新考虑您的选择...如果您需要扫描可识别的元素,请考虑为此目的编写必要的解析器规则。
  • 好吧,我实际上使用期望,因为我捕获了迭代器位置的异常(行号+错误的列号)。难道没有其他方法可以完成任务吗?
  • 是的。随意搜索 boost-spirit 答案(例如 line_pos_iterator 和 iter_pos)
猜你喜欢
  • 2016-01-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-01-17
  • 2011-03-05
  • 2015-01-25
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多