正则表达式从直到字符串捕获答案

【问题标题】：regex from till string capture正则表达式从直到字符串捕获
【发布时间】：2011-12-02 08:40:50
【问题描述】：

<h2 class="element">
name
</h2>
<div class="outerElement">
address
</div>
<h2 class="element">
name
</h2>
<div class="outerElement">
address
</div>

我需要一个正则表达式，它可以获取 <h2 class="element"> 到下一个 <h2 class="element"> 之间的所有内容，所以我想出了这个：

preg_match_all('/div class="outerElement"(.*?)div class="outerElement"/', $content, $elements);

但由于某种原因它不起作用（我必须转义双引号还是有什么问题？

【问题讨论】：

不得使用正则表达式解析 HTML：stackoverflow.com/a/1732454/298479

标签： php regex match

【解决方案1】：

像这样在表达式中添加“s”修饰符：

 '/div class="outerElement"(.*?)div class="outerElement"/s'

这是强制多行模式匹配所必需的。

【讨论】：

【解决方案2】：

这里不要使用正则表达式。改用 PHP DOM 解析。您的任务会更轻松，更不容易出错。

http://www.php.net/manual/en/domdocument.getelementsbytagname.php

【讨论】：

【解决方案3】：

以下正则表达式捕获组 1 中的所有匹配项。

正如您所说，您需要使用 preg_match_all 遍历匹配项。

为方便起见，这里是空白模式下的正则表达式。

(?xs)                       # modes: whitespace, dot matches new line
(?<=<h2[ ]class="element">) # is there an element h2 tag behind us
\W*                         # match any non-word char (greedy)
(\w.*?)                     # capture a word char followed by any char (lazy)
<h2[ ]class="element"       # match the next class element

这里是一个示例 preg_match_all ，它使用这个正则表达式并返回捕获的组。我已经用您的示例字符串对其进行了测试。有用。 :)

<?php 
$subject='<h2 class="element">
name
</h2>
<div class="outerElement">
address
</div>
<h2 class="element">
name
</h2>
<div class="outerElement">
address
</div>
';
preg_match_all('/(?xs)       # modes: whitespace, dot matches new line
(?<=<h2[ ]class="element">) # is there an element h2 tag behind us
\W*                         # match any non-word char (greedy)
(\w.*?)                     # capture a word char followed by any char (lazy)
<h2[ ]class="element"       # match the next class element
/s', $subject, $all_matches, PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER);
$size=count($all_matches[1]);
echo "<br />*****************<br />";
echo "Number of Matches: ".$size."<br />";
echo "*****************<br />";
for ($i=0;$i<$size;$i++) {
echo "Match number: ".($i+1)."<br />";
echo "At position: ".$all_matches[1][$i][1]."<br />";   
echo "Captured text: ".htmlentities($all_matches[1][$i][0])."<br />";
}
echo "End of Matches<br />";
echo "*****************<br /><br />";
?>

最后，输出如下：

*****************
Number of Matches: 1
*****************
Match number: 1
At position: 22
Captured text: name </h2> <div class="outerElement"> address </div>
End of Matches
*****************

如果我理解的话，这就是你要找的东西。

【讨论】：