【发布时间】:2021-01-01 15:46:24
【问题描述】:
有什么区别
$domd=new DOMDocument();
$domd->loadHTML($html, LIBXML_NOBLANKS);
和
$domd=new DOMDocument();
$domd->loadHTML($html, 0);
?
编辑:以防万一有人想删除所有空+空白文本节点(这不完全是 LIBXML_NOBLANKS 所做的),这里有一个函数可以做到这一点,
$removeAnnoyingWhitespaceTextNodes = function (\DOMNode $node) use (&$removeAnnoyingWhitespaceTextNodes): void {
if ($node->hasChildNodes()) {
// Warning: it's important to do it backwards; if you do it forwards, the index for DOMNodeList might become invalidated;
// that's why i don't use foreach() - don't change it (unless you know what you're doing, ofc)
for ($i = $node->childNodes->length - 1; $i >= 0; --$i) {
$removeAnnoyingWhitespaceTextNodes($node->childNodes->item($i));
}
}
if ($node->nodeType === XML_TEXT_NODE && !$node->hasChildNodes() && !$node->hasAttributes() && (strlen(trim($node->textContent)) === 0)) {
//echo "Removing annoying POS";
// var_dump($node);
$node->parentNode->removeChild($node);
} //elseif ($node instanceof DOMText) { echo "not removed"; var_dump($node, $node->hasChildNodes(), $node->hasAttributes(), trim($node->textContent)); }
};
$dom=new DOMDocument();
$dom->loadHTML($html);
$removeAnnoyingWhitespaceTextNodes($dom);
【问题讨论】:
标签: php xml-parsing html-parsing libxml2