PHP - 复杂的正则表达式提取答案

【问题标题】：PHP - Complicated Regex extractionPHP - 复杂的正则表达式提取
【发布时间】：2013-02-08 19:57:40
【问题描述】：

我有一些字符串要解析，它变得有点复杂了。

<?php
$notecomments = '
This is the first of the notes, and so whatever comes later is appended.<br>
(<b>John Smith</b>) at <b class="datetimeGMT">2012-02-07 00:00:20 GMT</b><hr>This is a comment posted<br><br>(<b>Alex Boom</b>) at <b class="datetimeGMT">2013-02-07 00:08:06 GMT</b><hr>And let's put some more in here<br />with a new line.';

if(preg_match_all('/\(<b>(?:(?!\(<b>).)*/s', $notecomments, $matches)){
print_r($matches);
}

/* result of code:
Array
(
    [0] => Array
        (
            [0] => (<b>John Smith</b>) at <b class="datetimeGMT">2012-02-07 00:00:20 GMT</b><hr>This is a comment posted<br><br>
            [1] => (<b>Alex Boom</b>) at <b class="datetimeGMT">2013-02-07 00:08:06 GMT</b><hr>And let's put some more in here<br />with a new line.
        )

)
*/
?>

我能够循环浏览“附加”注释，因为我在 preg_match_all 正则表达式规则中可以使用指标。

但是，我的许多笔记在我的preg_match_all 第一次迭代之前都有文字。（在这种情况下：“这是第一个注释，所以后面的内容都会被附加。
”）

我的第一个目标实现了。这是我上面的代码的结果。我正在提取第一个注释的附加注释。

我的下一个目标是在第一次迭代之前检测到任何东西。这就是我卡住的地方。（在第一次迭代之前检测到任何东西，在我上面的正则表达式中）

【问题讨论】：

您能否用简单的语言明确说明您的匹配/提取标准区域是什么（即我想捕获第一个 <br> 之前的所有内容或第一个 (<b> 之前的所有内容或其他内容。
因为在这种情况下它同样很麻烦，您可以考虑使用 DOM 方法遍历节点（文本、标签）并查找b，然后是最后的br 标签或换行符。

标签： php regex preg-match-all

【解决方案1】：

我为此使用 preg_replace_callback 和两个正则表达式喜欢

 $notecomments = "This is the first of the notes, and so whatever comes later is appended.<br>(<b>John Smith</b>) at <b class=\"datetimeGMT\">2012-02-07 00:00:20 GMT</b><hr>This is a comment posted<br><br>(<b>Alex Boom</b>) at <b class=\"datetimeGMT\">2013-02-07 00:08:06 GMT</b><hr>And let's put some more in here<br />with a new line.";
 $output=preg_replace_callback(array("~<b (.*?)>(.+?)</b>~si","~<b>(.+?)</b>~si"),function($matches){
if(isset($matches[2])){
  print_r($matches[2]."\n");
}else{
  print_r($matches[1]."\n");
}
return '';},' '.$notecomments.' ');

输出：

 2012-02-07 00:00:20 GMT
 2013-02-07 00:08:06 GMT
 John Smith
 Alex Boom

【讨论】：