c++11正则表达式提取文本答案

【问题标题】：c++11 regex extracting textc++11正则表达式提取文本
【发布时间】：2015-09-04 20:14:32
【问题描述】：

我一直在尝试一些看似相当简单的事情……一开始。

我正在尝试使用正则表达式转换我一直在使用 strstr 进行的文本操作，这似乎是现在使用 c++11 的方式。这是一个测试用例示例：

<!Sometag>
// Lots of code here! (Multiline)
<Sometag!>

<!Sometag2>
// Lots of code here! (Multiline)
<Sometag2!>

编辑：一个更明确的例子。

/// Comments.

<!Vertex>
#version 150
/// code here!
void main()
{
 /// code here!
}
<Vertex!>

/// Comments.
<!Fragment>
#version 150
/// code here!

void main()
{
/// code here!
}
<Fragment!>

编辑 2：这是一个更好的例子来说明需要做什么：

regex editor

我已经做了很多组合，但想到的最合乎逻辑的是这个：

std::smatch u;
std::string s = shader->GetData();
std::regex_match(s, u, std::regex("<.*>(.*)<!.*>"));

我还没有运气，我想知道是否有人知道语法可能是什么？！

谢谢

【问题讨论】：

和 !对正则表达式来说是特殊的。你需要逃离他们。转义字符是 C 字符串特有的 \，你也需要转义它们，所以试试 "\\]*\\>(.*)\\]\\> " - 不过我现在没有办法测试它
我猜缺少一个 *，我尝试了这个以及您的建议："\\]*\\>(.*)\\]*\\>" 到目前为止还没有运气。我会继续挖掘。
您可以使用原始字符串文字代替转义，例如std::regex(R"delim(<.*>(.*)<!.*>)delim")。请参阅#6 here 或solarianprogrammer.com/2011/10/16/…。
谢谢vsoftco，已经注意到了。
您确定要使用 regex_match，也许 regex_search 更合适？ match 必须匹配整个目标

标签： c++ regex

【解决方案1】：

如果你的标签很整齐，你可以试试这样：(?:<[^<>!]*?>\n?)((.|\n)*?)(?:<!.*>(\n|$)?)

需要相应转义到C，当然：(?:\\<[^<>!]*?\\>\\n?)((.|\\n)*?)(?:\\<!.*\\>(\\n|$)?)

代码本身就是捕获组 $1

https://regex101.com/r/vC5xD3/2（它是 php，但想法仍然存在）。

【讨论】：

【解决方案2】：

请考虑使用 regex_search。然后，您可以访问每个子匹配项。这是一个你可以开始的例子......

std::smatch u;
std::string s = "<Sometag>\n// Lots of code here! (Multiline)\n<!Sometag>\n\n<Sometag2>\n// Lots of code here! (Multiline)\n<!Sometag2>\n";
std::regex e("<[^>]*>([^<]*)<[!][^>]*>[^<]*");
std::cmatch m;

while (s.length() > 0)
{
    bool result = std::regex_search(s.cbegin(), s.cend(), u, e);
    if (result == false)
        break;
    for (std::smatch::iterator it = u.begin(); it != u.end(); ++it)
    {
        std::cout << *it << std::endl;
    }
    s = u.suffix();
}

编辑下面的表达式更能容忍代码内的

std::regex e("^<([^>]*)>(((.|\\n)*)<[!]\\1>[^<]*)?");

这种方式有更多的子匹配，但其中之一将是标签之间的内容。

编辑 2 根据您提供的更好的示例字符串，这里是另一个将目标字符串分解为子匹配的代码示例。第 4 个子匹配是标签的实际内容

std::smatch u;
std::string s = "/// Comments.\r\n\r\n<!Vertex>\r\n#version 150\r\n\r\n//#define DISPLAY_DIFFUSE 0\r\n//#define DISPLAY_BUMP 1\r\n//#define DISPLAY_SPECULAR 2\r\n//#define ... \r\n\r\n<Vertex!>\r\n\r\n/// Comments.\r\n\r\n<!Fragment>\r\n#version 150\r\n\r\n//#define DISPLAY_DIFFUSE 0\r\n//#define DISPLAY_BUMP 1\r\n//#define DISPLAY_SPECULAR 2\r\n//#define ... \r\n\r\n<Fragment!>\r\n\r\n";
std::regex e("<[!]([^>]*)>");  //to skip to the first tag
std::cmatch m;

// search to the first tag
bool result = std::regex_search(s.cbegin(), s.cend(), u, e);
if (result == true)
{   // skip characters before first tag
    s = s.substr(u.prefix().length());

    // regex to find <!tag>...<tag!>... capturing several things
    // fourth sub-string is the content between the tags
    e = std::regex("^<[!]([^>]*)>(((.|[\\n\\r])*)<\\1[!]>[^<]*)");
    while (s.length() > 0)
    {
        // find a tag and its contents
        result = std::regex_search(s.cbegin(), s.cend(), u, e);
        if (result == false)
            break;
        // interate through the sub-matches (the 4th is the
        // contents between tags
        int idx = 0;
        for (std::smatch::iterator it = u.begin(); it != u.end(); ++it)
        {
            if(++idx == 4)
                std::cout << *it << std::endl;
        }
        s = u.suffix();
    }
}

【讨论】：

似乎是个不错的计划。我将数据解析为其中的代码，因此导致了很多我试图解决的问题。还在挖。
所以基本上代码中可能有所以我将标签更改为： start end 在第一个标签之前也可能有文本，如下所示：// / 注释。 /// 代码 /// 更多评论。 /// 代码 /// 还是 cmets.
好的，知道了。 "]*>([^]*[!]>" 谢谢大家帮忙。这比预期的更具挑战性。
显然，我很幸运，因为我还没到那一步。这导致我在一些使用
我认为您的意思不是 [^