PHP DOMDocument 用 HTML 字符串替换 DOMElement 子元素答案

【问题标题】：PHP DOMDocument replace DOMElement child with HTML stringPHP DOMDocument 用 HTML 字符串替换 DOMElement 子元素
【发布时间】：2011-01-15 01:49:51
【问题描述】：

我尝试使用 PHP 获取从 WYSIWYG 编辑器传递的 HTML 字符串，并用新的 HTML 替换预加载的 HTML 文档中元素的子元素。

到目前为止，我正在加载标识要通过 ID 更改的元素的文档，但是将 HTML 转换为可以放置在 DOMElement 中的内容的过程让我望而却步。

libxml_use_internal_errors(true);

$doc = new DOMDocument();
$doc->loadHTML($html);

$element = $doc->getElementById($item_id);
if(isset($element)){
    //Remove the old children from the element
    while($element->childNodes->length){
        $element->removeChild($element->firstChild);
    }

    //Need to build the new children from $html_string and append to $element
}

【问题讨论】：

标签： php dom domdocument

【解决方案1】：

我知道这是旧的，但当前的答案都没有显示如何用存储在字符串中的 HTML 替换 DOMDocument 中的 DOMNode(s) 的最小工作示例。

// the HTML fragment we want to use as the replacement
$htmlReplace = '<div><strong>foo</strong></div>';
// the HTML of the original document
$htmlHaystack = '<p><a id="tag">bar</a></p>';

// load the HTML replacement fragment
$domDocumentReplace = new \DOMDocument;
$domDocumentReplace->loadHTML($htmlReplace, LIBXML_HTML_NOIMPLIED);

// load the HTML of the document
$domDocumentHaystack = new \DOMDocument;
$domDocumentHaystack->loadHTML($htmlHaystack, LIBXML_HTML_NOIMPLIED);

// import the replacement node into the document
$htmlReplaceNode = $domDocumentHaystack->importNode($domDocumentReplace->documentElement, true);

// find the DOMNode(s) we want to replace - in this case #tag (to keep the example simple)
$domNodeTag = $domDocumentHaystack->getElementById('tag');

// replace the node
$domNodeTag->parentNode->replaceChild($htmlReplaceNode, $domNodeTag);

// output the new HTML of the document
echo $domDocumentHaystack->saveHTML($domDocumentHaystack->documentElement);
// <p><div><strong>foo</strong></div></p>

【讨论】：

【解决方案2】：

当前接受的答案建议使用 appendXML()，但承认它不会处理复杂的 html，例如原始问题中指定的 WYSISYG 编辑器返回的内容。正如所建议的 loadHTML() 可以解决这个问题。但还没有人展示如何。

我认为这是对解决编码问题、“文档片段为空”警告和“错误文档错误”错误的原始问题的最佳/正确答案，如果他们从头开始编写，可能会遇到这些错误。我知道我是按照之前回复中的提示找到的。

这是来自我支持的网站的代码，它将 WordPress 侧边栏内容插入到帖子的 $content 中。它假定 $doc 是一个有效的 DOMDocument，类似于 $doc 在原始问题中的定义方式。它还假设 $element 是您希望在其后插入侧边栏内容（或其他内容）的标签。

            // NOTE: Cannot use a document fragment here as the AMP html is too complex for the appendXML function to accept.
            // Instead create it as a document element and insert that way.
            $node = new DOMDocument();
            // Note that we must encode it correctly or strange characters may appear.
            $node->loadHTML( mb_convert_encoding( $sidebarContent, 'HTML-ENTITIES', 'UTF-8') );
            // Now we need to move this document element into the scope of the content document 
            // created above or the insert/append will be rejected.
            $node = $doc->importNode( $node->documentElement, true );
            // If there is a next sibling, insert before it.
            // If not, just add it at the end of the element we did find.
            if (  $element->nextSibling ) {
                $element->parentNode->insertBefore( $node, $element->nextSibling );
            } else {
                $element->parentNode->appendChild($node);
            }

完成所有这些后，如果您不想拥有带有 body 标记的完整 HTML 文档的源代码等，您可以使用以下代码生成更本地化的 html：

    // Now because we have moved the post content into a full document, we need to get rid of the 
    // extra elements that make it a document and not a fragment
    $body = $doc->getElementsByTagName( 'body' );
    $body = $body->item(0);

    // If you need an element with a body tag, you can do this.
    // return $doc->savehtml( $body );

    // Extract the html from the body tag piece by piece to ensure valid html syntax in destination document
    $bodyContent = ''; 
    foreach( $body->childNodes as $node ) { 
            $bodyContent .= $body->ownerDocument->saveHTML( $node ); 
    } 
    // Now return the full content with the new content added. 
    return $bodyContent;

【讨论】：

@Damneddani 请注意，savehtml( $body ) 最终会返回带有 body 标记的 HTML。如果您将 html 插入另一个页面，则会产生无效的 html。尝试做这样的事情： $rootContent = ''; foreach( $rootNode->childNodes as $node ){ $rootContent .= $rootNode->ownerDocument->saveHTML( $node ); } // 不返回添加侧边栏内容的完整内容。返回 $rootContent;

【解决方案3】：

如果HTML字符串可以解析为XML，可以这样做（清除所有子节点的元素后）：

$fragment = $doc->createDocumentFragment();
$fragment->appendXML($html_string);
$element->appendChild($fragment);

如果 $html_string 不能被解析为 XML，它将失败。如果是这样，你将不得不使用 loadHTML()，它不那么严格——但它会在你必须剥离的片段周围添加元素。

与 PHP 不同，Javascript 具有 innerHTML 属性，可让您轻松完成此操作。我在一个项目中需要类似的东西，所以我扩展了 PHP 的 DOMElement 以包含类似 Javascript 的 innerHTML 访问。

使用它，您可以像在 Javascript 中一样访问 innerHTML 属性并对其进行更改：

echo $element->innerHTML;
$elem->innerHTML = '<a href="http://example.org">example</a>';

来源：http://www.keyvan.net/2012/11/php-domdocument-replace-domelement-child-with-html-string/

【讨论】：

@Greg，我不应该决定我的贡献去哪里吗？你从什么时候开始为世界说话？在我的一些贡献从 StackOverflow 中删除并隐藏后，我决定将我的贡献转移到我自己的博客上。我想保持这种状态，所以请还原更改。
始终欢迎提供指向潜在解决方案的链接，但请在链接周围添加上下文，以便您的其他用户了解它是什么以及它存在的原因。始终引用重要链接中最相关的部分，以防目标站点无法访问或永久离线。来源：How to answer
@Greg，我知道这些准则。由于我在此站点上处理其他贡献的方式，我最初在此处发布了答案并将其移至我自己的站点-正如我上面提到的，它们已被删除并对我隐藏。为什么你这么反对一个链接是超出我的。来自该网站的一位创建者codinghorror.com/blog/2009/08/… 的一些思考“未经您的同意，您的贡献可以被撤销、删除或永久下线吗？”在 Stackoverflow 上：是的。在我自己的网站上：没有。
链接没有被我删除，我只是扩展了内容，所以我们两全其美。
@Keyvan：但是您首先在 Stackoverflow 上发表了文章。如果您稍后将其删除，但其他用户决定保留该内容，则保留该内容没有任何问题。

【解决方案4】：

我知道这是一个旧线程（但回复此问题，因为也在寻找解决方案）。我做了一个简单的方法，在使用它时只用一行替换内容。为了更好地理解该方法，我还添加了一些上下文命名的函数。

现在这是我库的一部分，所以这就是这里所有函数名称的原因，所有函数都以前缀“su”开头。

它非常易于使用且功能强大（而且代码非常少）。

代码如下：

function suSetHtmlElementById( &$oDoc, &$s, $sId, $sHtml, $bAppend = false, $bInsert = false, $bAddToOuter = false )
 {
    if( suIsValidString( $s ) && suIsValidString( $sId ))
    {
     $bCreate = true;
     if( is_object( $oDoc ))
     {
       if( !( $oDoc instanceof DOMDocument ))
        { return false; }
       $bCreate = false;
     }

     if( $bCreate )
      { $oDoc = new DOMDocument(); }

     libxml_use_internal_errors(true);
     $oDoc->loadHTML($s);
     libxml_use_internal_errors(false);
     $oNode = $oDoc->getElementById( $sId );

     if( is_object( $oNode ))
     { 
       $bReplaceOuter = ( !$bAppend && !$bInsert );

       $sId = uniqid('SHEBI-');
       $aId = array( "<!-- $sId -->", "<!--$sId-->" );

       if( $bReplaceOuter )
       {
         if( suIsValidString( $sHtml ) )
         {
             $oNode->parentNode->replaceChild( $oDoc->createComment( $sId ), $oNode );
             $s = $oDoc->saveHtml();
             $s = str_replace( $aId, $sHtml, $oDoc->saveHtml());
         }
         else { $oNode->parentNode->removeChild( $oNode ); 
                $s = $oDoc->saveHtml();
              }
         return true;
       }

       $bReplaceInner = ( $bAppend && $bInsert );
       $sThis = null;

       if( !$bReplaceInner )
       {
         $sThis = $oDoc->saveHTML( $oNode );
         $sThis = ($bInsert?$sHtml:'').($bAddToOuter?$sThis:(substr($sThis,strpos($sThis,'>')+1,-(strlen($oNode->nodeName)+3)))).($bAppend?$sHtml:''); 
       }

       if( !$bReplaceInner && $bAddToOuter )
       { 
          $oNode->parentNode->replaceChild( $oDoc->createComment( $sId ), $oNode );
          $sId = &$aId;
       }
       else { $oNode->nodeValue = $sId; }

       $s = str_replace( $sId, $bReplaceInner?$sHtml:$sThis, $oDoc->saveHtml());
       return true;
     }
    } 
    return false; 
 }

// A function of my library used in the function above:
function suIsValidString( &$s, &$iLen = null, $minLen = null, $maxLen = null )
{
  if( !is_string( $s ) || !isset( $s{0} ))
   { return false; }

  if( $iLen !== null )
   { $iLen = strlen( $s ); }

  return (( $minLen===null?true:($minLen > 0 && isset( $s{$minLen-1} ))) && 
           $maxLen===null?true:($maxLen >= $minLen && !isset( $s{$maxLen})));   
}

一些上下文函数：

 function suAppendHtmlById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, true, false ); }

 function suInsertHtmlById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, false, true ); }

 function suAddHtmlBeforeById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, false, true, true ); }

 function suAddHtmlAfterById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, true, false, true ); }

 function suSetHtmlById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, true, true ); }

 function suReplaceHtmlElementById( &$s, $sId, $sHtml, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, $sHtml, false, false ); }

 function suRemoveHtmlElementById( &$s, $sId, &$oDoc = null )
 { return suSetHtmlElementById( $oDoc, $s, $sId, null, false, false ); }

如何使用：

在以下示例中，我假设已将内容加载到名为 $sMyHtml 的变量中，并且变量 $sMyNewContent 包含一些新的 html。变量$sMyHtml 包含一个名为/ id 为“example_id”的元素。

// Example 1: Append new content to the innerHTML of an element (bottom of element):
if( suAppendHtmlById( $sMyHtml, 'example_id', $sMyNewContent ))
 { echo $sMyHtml; }
 else { echo 'Element not found?'; }

// Example 2: Insert new content to the innerHTML of an element (top of element):
suInsertHtmlById( $sMyHtml, 'example_id', $sMyNewContent );    

// Example 3: Add new content ABOVE element:
suAddHtmlBeforeById( $sMyHtml, 'example_id', $sMyNewContent );    

// Example 3: Add new content BELOW/NEXT TO element:
suAddHtmlAfterById( $sMyHtml, 'example_id', $sMyNewContent );    

// Example 4: SET new innerHTML content of element:
suSetHtmlById( $sMyHtml, 'example_id', $sMyNewContent );    

// Example 5: Replace entire element with new content:
suReplaceHtmlElementById( $sMyHtml, 'example_id', $sMyNewContent );    

// Example 6: Remove entire element:
suSetHtmlElementById( $sMyHtml, 'example_id' );

【讨论】：

干得漂亮！超级好用。 $oDoc 用于什么——是传递现有的 domdoc 对象吗？我认为你应该把它变成一个图书馆并记录下来。
@Miro 这么久了，谢谢伙计！ $oDoc 是一个参数，您可以使用它来传递 DOMDocument 的实例，以避免函数每次调用时都需要创建 DOMDocument 实例。所以如果你想对同一个做很多操作，你最好先自己创建一个 DOMDocument 的实例，以减少开销和时间。

【解决方案5】：

您可以在代码片段上使用loadHTML()，然后将生成的创建节点附加到原始 DOM 树中。

【讨论】：

您是否建议使用加载 HTML 创建一个新的 DOMDocument，然后获取新 Document 的 body 标记的子项并将它们附加到原始 DOM 中？还是我缺少另一个 loadHTML() 函数。
我真的很讨厌在你执行 saveHTML() 或 loadHTML() 之类的操作时如何自动添加 html 和 body 标签。除了编写一个将它们剥离的包装器之外，是否还有一种简单的解决方法？