使用 PHP 从字符串中删除 html 元素答案

【问题标题】：Using PHP to remove a html element from a string使用 PHP 从字符串中删除 html 元素
【发布时间】：2024-05-19 07:35:02
【问题描述】：

我无法确定如何执行此操作，我有一个字符串看起来像这样......

    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";

我基本上想使用 preg_repalce 和 regex 之类的东西来删除

<em>This is some example text This is some example text This is some example text</em>

所以我需要编写一些 PHP 代码来搜索开始  和结束  并删除中间的所有文本

希望有人可以提供帮助，谢谢。

【问题讨论】：

字符串是否总是只包含一组标签？
然后有空的元素？
是的，他们总是在那里，是的，我最终会得到一个空的
但这不是问题

标签： php regex string preg-replace

【解决方案1】：

$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);

【讨论】：

这与我正在寻找的内容一致，但我收到此错误警告：preg_replace() [function.preg-replace]: Unknown modifier '>'
抱歉，我忘了转义结束 () 组中的斜杠。 () 应该是：()
试一试，恐怕它什么也没做
什么都不做是什么意思？我得到以下输出：“
这是一些示例文本这是一些示例文本这是一些示例文本

这是一些示例文本这是一些示例文本这是是一些示例文本
"
您确实意识到 () 中没有大写 V 对吗？它是一个反斜杠 '\' 然后是一个正斜杠 '/'...

【解决方案2】：

如果您对非正则表达式解决方案感兴趣，也可以：

<?php
    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";


    $emStartPos = strpos($text,"<em>");
    $emEndPos = strpos($text,"</em>");

    if ($emStartPos && $emEndPos) {
        $emEndPos += 5; //remove <em> tag aswell
        $len = $emEndPos - $emStartPos;

        $text = substr_replace($text, '', $emStartPos, $len);
    }

?>

这将删除标签之间的所有内容。

【讨论】：

很好，如果我在此基础上添加一些内容并添加类似 preg_replce("", " ", $text) 和 preg_replce("", " ", $text) 那么这也会去掉 标签吗？
如果您不想保留标签，请使用 $emStartPos += 4 代替 $emEndPos += 5（'' 长度为 5 个字符）
我不敢相信，你选择了这个答案。代码太多，不整洁，不是最优的..
@KalleH.Väravas 我认为 AdriftUniform 决定使用它，因为它比正则表达式更容易阅读，特别是如果一个人不熟悉正则表达式。我同意你的观点，正则表达式确实在一行代码中解决了这个问题。在后台，解释器仍然需要分析正则表达式，而不是对文本执行操作，所以我不确定在这种特殊情况下，正则表达式是否会更优化？也许 AdriftUniform 可以在每个解决方案上运行计时测试，并使用更高效的解决方案，特别是当他/她计划处理许多文本块时。

【解决方案3】：

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

preg_match("#<em>(.+?)</em>#", $text, $output);

echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em

^{[View output]}

为了让这个例子正常工作，我稍微改变了的内容，否则你所有的文字都是一样的，你无法真正理解脚本是否有效。

但是，如果您想摆脱  而不是获取内容：

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

echo preg_replace("/<em>(.+)<\/em>/", "", $text);

^{[View output]}

【讨论】：

注意：这是假设你的字符串中只有一个。
我明白了，这个文本去掉了文本，但它留下了什么，我不感兴趣的实际 文本，我想从字符串中删除它并留下与其余文本
@AdriftUniform，我的错，我的问题有点不对。查看编辑，它应该是您所要求的。
当心你是否有像多行 HTML 这样的东西。 .+ 默认情况下不匹配换行符。我花了大约一个小时才终于发现 PCRE_DOTALL 和 /s 修饰符。
非常有效...本来打算使用 php html dom 类，但这要简单得多，我甚至需要通过 id 来定位元素...例如：echo preg_replace('/(.+)<\/em>/', "", $text);

【解决方案4】：

使用 strrpos 查找第一个元素并然后是最后一个元素。使用 substr 获取字符串的一部分。然后用原始字符串中的空字符串替换子字符串。

【讨论】：

如果你可以用一个函数、一行、一个匹配来做洞的事情，为什么要这么复杂？！
@Kalle 正则表达式也很复杂。它们只能以非常简洁的方式编写。但是解释器需要解析和翻译它们。您只是看不到复杂性，因为它发生在幕后。
无法使用正则表达式解析 HTML。包含字符 的 cmets 或带引号的字符串呢？或者 ……….

【解决方案5】：

  format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);

【讨论】：

OP 不想去除所有标签，而只是去除 标签和内容。
如上所述，这不是我想要的，我只想删除 并且我想删除 strip_tags 不会做的文本