如何使用 Boost Spirit 解析类似转义字符串的 CSV？答案

【问题标题】：How to parse an CSV like escaped String with Boost Spirit?如何使用 Boost Spirit 解析类似转义字符串的 CSV？
【发布时间】：2020-03-24 07:30:35
【问题描述】：

对于我的快速解析器项目，我想使用 CSV 转义："" 转义 "

例子：

 "\"hello\"",
 "   \"  hello \"  ",
 "  \"  hello \"\"stranger\"\" \"  ",

在线编译尝试：https://wandbox.org/permlink/5uchQM8guIN1k7aR

我当前的解析规则只解析前 2 个测试

qi::rule<std::string::const_iterator, qi::blank_type, utree()> double_quoted_string
    = '"' >> qi::no_skip[+~qi::char_('"')] >> '"';

我发现了这个 stackoverflow 问题，并使用 Spirit 给出了一个答案：

How can I read and parse CSV files in C++?

start       = field % ',';
field       = escaped | non_escaped;
escaped     = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE)  >> '"'];
non_escaped = lexeme[       *( char_ -(char_('"') | ',')                  )        ];
DDQUOTE     = lit("\"\"")       [_val = '"'];
COMMA       = lit(",")          [_val = ','];

（我不知道如何链接答案，所以如果有兴趣搜索“当你使用像 boost::spirit 这样美丽的东西时，你会感到自豪”）

遗憾的是，它并没有为我编译——甚至多年的 C++ 错误信息分析也没有让我为精神错误信息泛滥做好准备:) 如果我理解正确，规则将等待, 作为字符串分隔符，对于我的表达式解析器项目来说，什么可能不是正确的

expression = "strlen( \"hello \"\"you\"\" \" )+1";
expression = "\"hello \"";
expression = "strlen(concat(\"hello\",\"you\")+3";

或者在这种情况下，规则是否需要可选地等待, 和)？

我希望我不要问太多愚蠢的问题，但答案可以帮助我进入精神状态除了字符串转义之外，表达式解析本身几乎可以工作

感谢任何帮助

更新：这似乎对我有用，至少它解析字符串但是从字符串中删除了转义的"，是否有更好的字符串调试输出可用？ " " " " "h" "e" "l" "l" "o" " " "s" "t" "r" "a" "n" "g" "e" "r" " " 真的不那么可读

qi::rule<std::string::const_iterator, utree()> double_quoted_string
  = qi::lexeme['"' >> *(qi::char_ - (qi::char_('"')) | qi::lit("\"\"")) >> '"'];

【问题讨论】：

您要查找的规则可能是"\"" >> qi::lexeme[*((qi::string("\\\"") | qi::char_) - "\"")] >> "\""，尽管由于qi::string 和qi::char 的属性不兼容，因此无法编译。
感谢您帮助它使用 qi::lit 而不是 qi::string 进行编译，但甚至无法解析我已经工作的好案例 - 每个转义字符串都会失败
尝试一次问一个问题。特别是，对 CSV 解析的整个引用似乎完全跑题了。出于好奇，请点击stackoverflow.com/questions/14992295/…中的一些链接
（由于精神newbienism）我目前还没有真正的感觉需要什么上下文来获得正确的提示/答案 - 有这么多的例子只是为体面的例子工作与许多技巧等。我尽量让它们保持小，但仍然不足以让你体验精神:)

标签： c++ boost boost-spirit boost-spirit-qi

【解决方案1】：

您可以将问题简化为这一点。如何让双引号字符串接受“双双引号”来转义嵌入的双引号字符？

一个没有转义的简单字符串解析器：

qi::rule<It, std::string()> s = '"' >> *~qi::char_('"') >> '"';

现在，也可以根据需要接受单个转义的"，只需添加：

s = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

其他说明：

在您的在线示例中，no_skip 的使用是草率的：它会将"foo bar" 和" foo bar " 解析为foo bar（修剪空格）。相反，从规则中删除船长以使其隐含词位(again)。
您的解析器不接受空字符串（这可能是您想要的，但不确定）
使用 utree 可能会使您的生活变得比您想要的更复杂

简化：

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;

int main()
{
    auto tests = std::vector<std::string>{
         R"( "hello" )",
         R"(    "  hello " )",
         R"(  "  hello ""escaped"" "  )",
    };
    for (const std::string& str : tests) {
        auto iter = str.begin(), end = str.end();

        qi::rule<std::string::const_iterator, std::string()> double_quoted_string
            = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';

        std::string ut;
        bool r = qi::phrase_parse(iter, end, double_quoted_string >> qi::eoi, qi::blank, ut);

        std::cout << str << " ";
        if (r) {
            std::cout << "OK: " << std::quoted(ut, '\'') << "\n";
        }
        else {
            std::cout << "Failed\n";
        }
        if (iter != end) {
            std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end)) << "\n";
        }
        std::cout << "----\n";
    }
}

打印

 "hello"  OK: 'hello'
----
    "  hello "  OK: '  hello '
----
  "  hello ""escaped"" "   OK: '  hello "escaped" '
----

【讨论】：