【发布时间】:2018-09-11 18:36:39
【问题描述】:
我正在编写从 html 文件中获取所有标签值“纯文本”的代码。但是,如果任何标签具有嵌套标签,它将进入 Childs 并获取没有子标签的标签值。 我试过这个,但它有点缺失
php代码:
$dochtml = new DOMDocument();
$dochtml->loadHTMLFile("index2.html");
$nodes = $dochtml ->getElementsByTagName("a");
gettagsvalue($nodes);
function gettagsvalue($nodes){
if($nodes->length != 0){
for ($i=0;$i<$nodes->length;$i++){
foreach ($tags=["h1","h2","h3","h4","h5","h6","h7","a","img","li","span","p","pre","i","strong","div","ul"] as $tag){
if($nodes->item($i)->getElementsByTagName($tag)->length != 0){
if ($nodes->item($i)->getElementsByTagName($tag)->length == 1){
echo "here"."<br><br><br> $tag";
echo "<pre>" ;print_r($nodes->item($i)->getElementsByTagName($tag)->item(0));echo "</pre>" ;
}else{
echo "there"."<br><br><br> $tag";
gettagsvalue($nodes->item($i)->getElementsByTagName($tag));
// echo "$tag <br><br><br>";
}
// print_r($nodes->item($i)->getElementsByTagName($tag));echo "<br>";
}
}
}
}
}
我希望得到
“绿色” “山谷”
HTML:
<a href="index.html" id="aaaaaaaaaaaa2015284957">
<img src="images/logo.png" width="50px" height="50px" id="imgaaaaaaaaaaimg732756221">
<span>Green</span>
<span id="spanaaaaaaaaaaspan1106733773">Valley</span>
</a>
【问题讨论】: