【问题标题】:C++::Boost::Regex Iterate over the submatchesC++::Boost::Regex 遍历子匹配
【发布时间】:2010-04-27 03:55:18
【问题描述】:

我正在使用带有 Boost Regex / Xpressive 的命名捕获组。

我想遍历所有子匹配,并获取每个子匹配的值和 KEY(即 what["type"])。

sregex pattern = sregex::compile(  "(?P<type>href|src)=\"(?P<url>[^\"]+)\""    );

sregex_iterator cur( web_buffer.begin(), web_buffer.end(), pattern );
sregex_iterator end;

for( ; cur != end; ++cur ){
    smatch const &what = *cur;

    //I know how to access using a string key: what["type"]
    std::cout << what[0] << " [" << what["type"] << "] [" << what["url"] <<"]"<< std::endl;

    /*I know how to iterate, using an integer key, but I would
      like to also get the original KEY into a variable, i.e.
      in case of what[1], get both the value AND "type"
    */
    for(i=0; i<what.size(); i++){
        std::cout << "{} = [" << what[i] << "]" << std::endl;
    }

    std::cout << std::endl;
}

【问题讨论】:

    标签: c++ regex boost


    【解决方案1】:

    对于 Boost 1.54.0,这更加困难,因为捕获名称甚至没有存储在结果中。相反,Boost 只是对捕获名称进行哈希处理并存储哈希(int)和指向原始字符串的相关指针。

    我编写了一个派生自 boost::smatch 的小类,它保存捕获名称并为它们提供迭代器。

    class namesaving_smatch : public smatch
    {
    public:
        namesaving_smatch(const regex& pattern)
        {
            std::string pattern_str = pattern.str();
            regex capture_pattern("\\?P?<(\\w+)>");
            auto words_begin = sregex_iterator(pattern_str.begin(), pattern_str.end(), capture_pattern);
            auto words_end = sregex_iterator();
    
            for (sregex_iterator i = words_begin; i != words_end; i++)
            {
                std::string name = (*i)[1].str();
                m_names.push_back(name);
            }
        }
    
        ~namesaving_smatch() { }
    
        std::vector<std::string>::const_iterator names_begin() const
        {
            return m_names.begin();
        }
    
        std::vector<std::string>::const_iterator names_end() const
        {
            return m_names.end();
        }
    
    private:
        std::vector<std::string> m_names;
    };
    

    该类在其构造函数中接受包含命名捕获组的正则表达式。像这样使用这个类:

    namesaving_smatch results(re);
    if (regex_search(input, results, re))
        for (auto it = results.names_begin(); it != results.names_end(); ++it)
            cout << *it << ": " << results[*it].str();
    

    【讨论】:

      【解决方案2】:

      看了一个多小时后,我觉得“船长不能这样做”还算安全。即使在 boost 代码中,它们在进行查找时也会迭代私有的 named_marks_ 向量。只是没有设置允许这样做。我想说最好的办法是遍历那些你认为应该存在的地方,并为那些没有找到的地方捕获异常。

      const_reference at_(char_type const *name) const
      {
          for(std::size_t i = 0; i < this->named_marks_.size(); ++i)
          {
              if(this->named_marks_[i].name_ == name)
              {
                  return this->sub_matches_[ this->named_marks_[i].mark_nbr_ ];
              }
          }
          BOOST_THROW_EXCEPTION(
              regex_error(regex_constants::error_badmark, "invalid named back-reference")
          );
          // Should never execute, but if it does, this returns
          // a "null" sub_match.
          return this->sub_matches_[this->sub_matches_.size()];
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2011-11-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-08-12
        相关资源
        最近更新 更多