【问题标题】:Removing DocDocument warning while parsing page content在解析页面内容时删除 DocDocument 警告
【发布时间】:2013-09-27 10:42:53
【问题描述】:

我正在尝试解析任何 url 的内容。不应该满足任何 html 代码。 这工作正常,但在阅读给出的 url 上的内容时会出现一堆错误。如何删除此警告?

<?php
$url= 'http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
foreach($xpath->query("//script") as $script) {
    $script->parentNode->removeChild($script);
}
$textContent = $doc->textContent; //inherited from DOMNode
echo $textContent;
?>

警告:

content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

【问题讨论】:

标签: php dom domdocument


【解决方案1】:

您可以使用libxml_use_internal_errors() 并执行以下操作:

libxml_use_internal_errors(true);
$doc->loadHTMLFile($url);
libxml_clear_errors();

正如 Peehaa 在下面的 cmets 中指出的,重置错误状态是个好主意。你可以这样做:

$errors = libxml_use_internal_errors(true); //store
$doc->loadHTMLFile($url);
libxml_clear_errors();
libxml_use_internal_errors($errors); //reset back to previous state

它是这样工作的:

Demo!

【讨论】:

  • 请注意,存储libxml_use_internal_errors 的当前状态并在之后重置它被认为是一种很好的做法。
  • @PeeHaa:好主意。我已将其添加到答案中:)
  • @AmalMurali:非常感谢。你能解释一下代码的区别吗?
  • @Karimkhan:为我的回答添加了解释。
  • 如果你再调用libxml_use_internal_errors,你就不需要调用libxml_clear_errors了。
猜你喜欢
  • 1970-01-01
  • 2012-05-25
  • 1970-01-01
  • 1970-01-01
  • 2011-11-10
  • 2015-05-08
  • 1970-01-01
  • 2018-01-31
  • 2021-08-26
相关资源
最近更新 更多