正则表达式计算非嵌套 </p> 标签答案

【问题标题】：Regex to count non nested </p> tags正则表达式计算非嵌套 </p> 标签
【发布时间】：2013-04-13 20:35:51
【问题描述】：

下面的函数和它说的差不多。它会在它找到的第二个段落标记之后将一串 html 插入内容中。

我需要稍微修改一下，让它只计算不在其他标签内的段落标签。换句话说，只有顶级段落标签。

用正则表达式有什么办法吗？

function my_html_insert($content){
    $InsertAfterParagraph = 2;

    if(substr_count(strtolower($content), '</p>') < $InsertAfterParagraph )
    {
        return $content .= myFunction($my_insert=1);
    }
    else
    {
        $replaced_content = preg_replace_callback('#(<p[\s>].*?</p>\n)#s', 'my_p_callback', $content);
    }
    return $replaced_content;
}


function my_p_callback($matches)
{
    static $count = 0;
    $ret = $matches[1];
    $pCount = get_option('my_p_count');

    if (++$count == $pCount){
        $ret .= myFunction($my_insert=1);
    }

    return $ret;
}

【问题讨论】：

为什么不解析 HTML？
正则表达式似乎更简单/更快（如果可能的话）
几乎从来没有这样。正则表达式不足以解析任意 HTML。
这就是我不要求它解析的原因。数一数。
正确计数意味着正确解析：上面的正则表达式在 HTML cmets 上会很高兴地失败，并且因为缺少 </p>，这是可选的。然而，大多数正则表达式引擎足够强大，可以匹配任何右递归上下文无关语法（参见 PCRE 或 Perl 中的 (?(DEFINE)(?<rule>pattern))）。 正确地做起来既不实用也不容易。这就是为什么使用现成的解析器是解决问题的最佳方法。

标签： php regex

【解决方案1】：

我仍然会解析它，因为它更干净且更易于维护：

<?php

$doc = new DOMDocument();
$doc->loadHTML("
    <!DOCTYPE html>
    <html>
        <body>
            <p>Test 1</p>
            <div>Test <p>2</p></div>
            <p>Test <span>3</span></p>
        </body>
    </html>
");
$xpath = new DOMXpath($doc);

$elements = $xpath->query("/html/body/p");

foreach ($elements as $element) {
    $node = $doc->createDocumentFragment();
    $node->appendXML('<h1>This is a test</h1>');

    if ($element->nextSibling) {
        $element->parentNode->insertBefore($node, $element->nextSibling);
    } else {
        $element->parentNode->appendChild($node);
    }
}

echo $doc->saveHTML();

?>

还有输出：

<!DOCTYPE html>
<html>
    <body>
        <p>Test 1</p><h1>This is a test</h1>
        <div>Test <p>2</p></div>
        <p>Test <span>3</span>t</p><h1>This is a test</h1>
    </body>
</html>

【讨论】：

这真的回答了这个问题吗？
@KennethK.：嗯，是的。这就是我将其发布为答案的原因。
当我计算某事时，我得到一个数字，而不是 HTML :\
@KennethK.：问题不在于计数。它只是修改顶级段落，这是 XPath 查询匹配的内容。
感谢搅拌机。我正在尝试测试你的答案。在 for 循环中，我只需要在第二段之后添加“测试文本”。 xpath 是否比计数更容易？