php htmlentities 标签例外只留下工作确定答案

【问题标题】：php htmlentities tags exceptions leave working only certainsphp htmlentities 标签例外只留下工作确定
【发布时间】：2014-08-26 00:37:22
【问题描述】：

我没有问题禁止所有带有此代码的 HTML 标记都可以正常工作：

while($row = $result->fetch_array()){
        echo "<span class='names'>".htmlentities($row['username'])."</span>:<span class='messages'>".htmlentities($row['msg'])."</span><br>";  
}

但是如果我想允许一些标签例外呢？

我想要的结果是禁用除<p><b><h2>之外的任何标签

示例：（允许<b>，禁止<div>）

<b>sometext</b><div>sometext</div>

预期结果：

一些文本 <div>sometext</div>

看图：

【问题讨论】：

在调用 htmlentities / specialchars_decode 之前运行 strip_tags 会发生什么？
strip_tags 只是没有任何效果..我还没有弄清楚如何正确使用它..
请为每个步骤添加一些示例输出。如果您使用 strip_tags 作为第一步，它应该像宣传的那样工作。
试试 $text = htmlentities($text, ENT_QUOTES, "UTF-8", true);第一的。这将确保您只对 html 进行一次编码
开始进行基本调试：在每个阶段之前/之后输出你的 $text，这样你就可以看到发生了什么。

标签： php html-entities strip-tags

【解决方案1】：

这是您的结果：

请注意在底部设置允许哪些标签：

function strip_html_tags( $text )
{
    $text = preg_replace(
        array(
          // Remove invisible content
            '@<b[^>]*?>.*?</b>@siu',   // HERE IS YOUR DISSALOW TAG WITH CONTENT
            '@<head[^>]*?>.*?</head>@siu',
            '@<style[^>]*?>.*?</style>@siu',
            '@<script[^>]*?.*?</script>@siu',
            '@<object[^>]*?.*?</object>@siu',
            '@<embed[^>]*?.*?</embed>@siu',
            '@<applet[^>]*?.*?</applet>@siu',
            '@<noframes[^>]*?.*?</noframes>@siu',
            '@<noscript[^>]*?.*?</noscript>@siu',
            '@<noembed[^>]*?.*?</noembed>@siu',
          // Add line breaks before and after blocks
            '@</?((address)|(blockquote)|(center)|(del))@iu',
            '@</?((h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
            '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
            '@</?((table)|(th)|(td)|(caption))@iu',
            '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
            '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
            '@</?((frameset)|(frame)|(iframe))@iu',
        ),
        array(
            "\$0", // RETURNED STATEMENT
             ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
            "\$0", "\$0", "\$0", "\$0", "\$0", "\$0",
            "\$0", "\$0",
        ),
        $text );
    $to_strip =  strip_tags( $text, '<b>' );            // STRIP YOUR BOLD TAGS
    // add here to another + add content on above '@<b[^>]*?>.*?</b>@siu', and returns "\$0" on arrays
    return $to_strip;
}

      $e = '<b>from_bold_text</b><div>from_div_text</div>';

      echo strip_html_tags($e);

结果：

   from_bold_text<div>from_div_text</div>

罢工>

shell:~$ php ar.php 
<b>sometext</b>sometext

shell:~$ cat ar.php 
<?php

$t ="<b>sometext</b><div>sometext</div>";

$text = htmlentities($t, ENT_QUOTES, "UTF-8");
$text = htmlspecialchars_decode($text);
$text = strip_tags($text, "<p><b><h2>");

echo $text;

shell:~$ php ar.php 
<b>sometext</b>sometext

注意：strip_tags 不会删除其中的值，只会删除标签。

$text = 'sometextsometext'; $text2 = strip_tags($text, ''); var_dump($text2); // 它将显示允许的标签 和值。

要删除其中的值，请使用正则表达式或其他带有 CONTENT ON MANUAL 的函数：

<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
  $tags = array_unique($tags[1]);

  if(is_array($tags) AND count($tags) > 0) {
    if($invert == FALSE) {
      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif($invert == FALSE) {
    return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return $text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>

您的期望：

$text = '<b>sometext_from_bold</b><div>sometext_from_div</div>';

// 这里是 function function strip_tags_content($text, $tags = '', $invert = FALSE) { .... }

// 你的结果

echo strip_tags_content($text, '<b>', FALSE);

结果：

<b>sometext_from_bold</b>

【讨论】：

我需要这样做来禁止聊天中的任何 html 标签，而只允许粗体标签... 你明白我的意思吗？你的代码很好，但我不需要删除像
之类的标签，如果用户写
，这将完全原样出现在聊天中！无需删除标签，只显示原始代码！
看图片，你能发布一个更新的工作示例吗？这会很棒。提前致谢
嘿，你的结果已经完成了，你需要知道什么时候你想过滤字符串放在正则表达式的上面并保存内容然后返回。之后剥离标签任何你想要的。请注意\$0 将作为过滤字符串的返回值放入，因此我在preg_replace() 函数的第二种方法中从array() 的开头添加。这是返回的语句。

【解决方案2】：

这段代码完成了这项工作，使用 DOMDocument 解析 HTML 代码。它似乎比正则表达式更可靠（如果用户在禁止标记中插入属性会发生什么？可能包含<>？），尤其是在阅读this question之后；但它需要更多的工作，而且不一定更快。

<?

$allowed = ['strong'];  // your allowed tags
$text = "<div>\n" .
        "    <div style=\"color: #F00;\">\n" .
        "       Your <strong>User Text</strong> with DIVs.\n".
        "   </div>\n" .
        "   more <strong>text</strong>\n" .
        "</div>\n";

echo selective_escape($text, $allowed);
/* outputs:

&lt;div&gt;
    &lt;div style="color: #F00;"&gt;
       Your <strong>User Text</strong> with DIVs.
   &lt;/div&gt;
   more <strong>text</strong>
&lt;/div&gt;

*/





/** Escapes HTML entities everywhere but in the allowed tags.
 */
function selective_escape($text, $allowed_tags) {

    $doc = new DOMDocument();

    /* DOMDocument normalizes the document structure when loading,
       adding a bunch of <p> around text where needed. We don't need
       this as we're working only on small pieces of HTML.
       So we pretend this is a piece of XML code.
       */
    // $doc->loadHTML($text);
    $doc->loadXML("<?xml version=\"1.0\"?><body>" . $text . "</body>\n");

    // find the body
    $body = $doc->getElementsByTagName("body")->item(0);

    // do stuff
    $child = $body->firstChild;
    while ($child != NULL) {
        $child = selective_escape_node($child, $allowed_tags);
    }

    // output the innerHTML. need to loop again
    $retval = "";

    $child = $body->firstChild;
    while ($child != NULL) {
        $retval .= $doc->saveHTML($child);
        $child = $child->nextSibling;
    }

    return $retval;
}






/** Escapes HTML for tags that are not in $allowed_tags for a DOM tree.
 *  @returns the next sibling to process, or NULL if we reached the last child.
 *
 *  The function replaced a forbidden tag with two text nodes wrapping the
 *  children of the old node.
 */
function selective_escape_node($node, $allowed_tags) {

    // preprocess children
    if ($node->hasChildNodes()) {
        $child = $node->firstChild;
        while ($child != NULL) {

            $child = selective_escape_node($child, $allowed_tags);

        }
    }

    // check if there is anything to do on $node as well
    if ($node->nodeType == XML_ELEMENT_NODE) {
        if (!in_array($node->nodeName, $allowed_tags)) {

            // move children right before $node
            $firstChild = NULL;
            while ($node->hasChildNodes()) {
                $child = $node->firstChild;

                if ($firstChild == NULL) $firstChild = $child;
                $node->removeChild($child);

                $node->parentNode->insertBefore($child, $node);
            }

            // now $node has no children.
            $outer_html = $node->ownerDocument->saveHTML($node);

            // two cases. either ends in "/>", or in "</TAGNAME>".
            if (substr($outer_html, -2) === "/>") {

                // strip off "/>"
                $outer_html = substr($outer_html, 0, strlen($outer_html) - 2);

            } else {

                // find the closing tag
                $close_tag = strpos($outer_html, "></" . $node->nodeName . ">");

                if ($close_tag === false) {

                    // uh-oh. something wrong
                    return NULL;

                } else {

                    // strip "></TAGNAME>"
                    $outer_html = substr($outer_html, 0, $close_tag);

                }

            }

            // put a textnode before the first child
            $txt1 = $node->ownerDocument->createTextNode($outer_html . ">");
            // and another before $node
            $txt2 = $node->ownerDocument->createTextNode("</" . $node->nodeName . ">");

            // note that createTextNode automatically escapes "<>".
            $node->parentNode->insertBefore($txt1, $firstChild);
            $node->parentNode->insertBefore($txt2, $node);

            // pick the next node to process
            $next = $node->nextSibling;
            // remove node
            $node->parentNode->removeChild($node);

            return $next;
        }
    }

    // go to next sibling
    return $node->nextSibling;

}

?>

【讨论】：

我按原样粘贴了您的代码，但出现此错误：“解析错误：语法错误，第 3 行 /path.php 中的意外 '['”
@user3746998 您正在使用 PHP this question。将["strong"] 替换为array("strong")。无论如何要小心，如果您使用的是非常旧的 PHP 版本，您可能没有代码中使用的所有 DOM 类。
我已经使用 require_once 将您的代码包含在一个 .php 文件中，并且一切正常。但是如果我使用这个组合请看This Example，输出中的所有内容都包含在
中...... wtf ？！请查看Result Image。我错了什么请帮忙！
@user3746998 问题在于DOMDocument 知道HTML 的规则：例如在HTML 4.01 Strict 中<body> 中不允许出现任何文本。因此，它将您的文本包裹在 </p> 周围以符合架构。也许使用<!DOCTYPE> 声明可以在正文中允许文本，但PHP 可能会将其他规范化规则应用于文档。所以我更新了代码以将 HTML 加载为 XML 并解决任何规范化问题。