【问题标题】:How to fetch the innerHTML code from HTML table using PHP如何使用 PHP 从 HTML 表中获取 innerHTML 代码
【发布时间】:2020-01-19 09:15:33
【问题描述】:

我能够正确解析 HTML 页面,但它只解析数据,而我想在 <tr> , <td> 中获取整个 HTML 代码。以下是我的 PHP 代码:

<?php    
   $dom = new DOMDocument();  

//load the html  
$html = $dom->loadHTMLFile("hydrocarbon.htm");  

  //discard white space   
//$dom->preserveWhiteSpace = false;   

  //the table by its tag name  
$tables = $dom->getElementsByTagName('table');   


    //get all rows from the table  
$rows = $tables->item(0)->getElementsByTagName('tr');   
  // get each column by tag name  
$cols = $rows->item(0)->getElementsByTagName('th');   
$row_headers = NULL;
foreach ($cols as $node) {
    //print $node->nodeValue."\n";   
    $row_headers[] = $node->nodeValue;
}   

$table = array();
  //get all rows from the table  
$rows = $tables->item(0)->getElementsByTagName('tr');   
foreach ($rows as $row)   
{   
   // get each column by tag name  
    $cols = $row->getElementsByTagName('td');   
    $row = array();
    $i=0;
    foreach ($cols as $node) {
        # code...
        //print $node->nodeValue."\n";   
        if($row_headers==NULL)
            $row[] = $node->nodeValue;
        else
            $row[$row_headers[$i]] = $node->nodeValue;
        $i++;
    }   
    $table[] = $row;
}   

//var_dump($table);
print("<pre>".print_r($table,true)."</pre>");
?>

这是我的结果:

这是我的 HTML 代码:

<table>
<thead>
<tr><th>Column 1</th><th>Column 2</th><th>Column 3</th></tr>
</thead>
<tbody>
<tr> <td><b>Q</b></td><td>Desc.</td> </tr>
<tr> <td>Type</td><td>Multiple choice</td> </tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>
<tr><td>Option</td><td>image #####2</td><td>incorrect</td></tr>

<tr><td>Solution</td><td>Some text / image</td></tr>
<tr><td>Marks</td><td>4</td><td>1</td></tr>
</tbody>
</table>

它正在解析 Q 而不是 &lt;b&gt;Q&lt;/b&gt;。我怎样才能做到这一点?

编辑 1:您的解决方案应该适用的原始表格

<table class=MsoNormalTable border=1 cellspacing=0 cellpadding=0 width=610 style='width:457.25pt;margin-left:10.8pt;background:#CED7E7;border-collapse:
 collapse;border:none'>
    <tr style='height:30.35pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:30.35pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Question<span style='border:none'> </span></span>
                </span>
            </p>
        </td>
        <td width=498 colspan=2 valign=top style='width:373.25pt;border:solid black 1.0pt;
  border-left:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:30.35pt'>
            <p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
  0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%;border:none'><b><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>Consider the following reaction,</span></span></b>
            </p>
            <p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
  0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>H</span></span></b><b><sub><span lang=EN-US
  style='font-family:"Garamond","serif";border:none'><span style='border:none'>3</span></span></sub></b><b><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>C – CH – CH – CH</span></span></b><b><sub><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>3</span></span></sub></b><b><span lang=EN-US
  style='font-family:"Garamond","serif";border:none'><span style='border:none'>
  + </span></span></b><b><span lang=EN-US style='font-family:"Garamond","serif";
  position:relative;top:2.0pt;border:none'><img width=26 height=29
  src="hydrocarbon2_files/image001.png"></span></b><b><span lang=EN-US
  style='font-family:"Garamond","serif";border:none'><span style='border:none'> &#8594;
  ‘X’  + HBr                                                   </span></span></b>
            </p>
            <p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
  0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>            |        |</span></span></b>
            </p>
            <p class=MsoNormal style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
  0cm;margin-left:18.0pt;margin-bottom:.0001pt;line-height:115%'><b><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>            D       CH</span></span></b><b><sub><span
  lang=EN-US style='font-family:"Garamond","serif";border:none'><span
  style='border:none'>3</span></span></sub></b>
            </p>
            <p class=MsoNoSpacing style='margin-top:0cm;margin-right:-48.45pt;margin-bottom:
  0cm;margin-left:.3pt;margin-bottom:.0001pt;text-align:justify;text-indent:
  -.3pt'><b><span lang=EN-GB style='font-size:16.0pt;font-family:"Chaparral Pro","serif"'>&nbsp;</span></b>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Type</span></span>
            </p>
        </td>
        <td width=498 colspan=2 valign=top style='width:373.25pt;border-top:none;
  border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>multiple_choice</span></span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Option</span></span>
            </p>
        </td>
        <td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span style='font-size:16.0pt;color:black;border:none'><img
  width=205 height=93 src="hydrocarbon2_files/image002.jpg"></span>
            </p>
        </td>
        <td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>I</span></span><span lang=EN-US style='font-size:16.0pt;
  border:none'><span style='border:none'>n<span style='border:none'>correct</span></span>
                </span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Option</span></span>
            </p>
        </td>
        <td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
  height=102 id="Picture 13" src="hydrocarbon2_files/image003.jpg"></span>
            </p>
        </td>
        <td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>C</span></span><span lang=EN-US style='font-size:16.0pt;
  border:none'><span style='border:none'>orrect</span></span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Option</span></span>
            </p>
        </td>
        <td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
  height=107 id="Picture 16" src="hydrocarbon2_files/image004.jpg"></span>
            </p>
        </td>
        <td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Incorrect</span></span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Option</span></span>
            </p>
        </td>
        <td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span style='font-size:16.0pt;border:none'><img width=205
  height=112 id="Picture 19" src="hydrocarbon2_files/image005.jpg"></span>
            </p>
        </td>
        <td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Incorrect</span></span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Solution</span></span>
            </p>
        </td>
        <td width=498 colspan=2 valign=top style='width:373.25pt;border-top:none;
  border-left:none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=MsoNormal style='margin-left:27.0pt;text-align:justify;text-indent:
  -27.0pt;line-height:115%'><span style='font-family:"Garamond","serif";
  border:none'><img width=398 height=92 id="Picture 10"
  src="hydrocarbon2_files/image006.jpg"></span>
            </p>
        </td>
    </tr>
    <tr style='height:15.0pt'>
        <td width=112 valign=top style='width:84.0pt;border:solid black 1.0pt;
  border-top:none;background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;
  height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>Marks</span></span>
            </p>
        </td>
        <td width=219 valign=top style='width:164.25pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>4</span></span>
            </p>
        </td>
        <td width=279 valign=top style='width:209.0pt;border-top:none;border-left:
  none;border-bottom:solid black 1.0pt;border-right:solid black 1.0pt;
  background:transparent;padding:4.0pt 4.0pt 4.0pt 4.0pt;height:15.0pt'>
            <p class=BodyA><span lang=EN-US style='font-size:16.0pt;border:none'><span
  style='border:none'>1</span></span>
            </p>
        </td>
    </tr>
</table>

【问题讨论】:

  • 这个答案会对你有所帮助:stackoverflow.com/a/17613826/12232340
  • 这不是我的要求,下面的答案适用于示例html,但是如果我输入我的实际html(我最后复制了),这个解决方案不起作用

标签: php html


【解决方案1】:

在你的第二个 for 循环中:

foreach ($rows as $row)   
{   
   // get each column by tag name  
    $cols = $row->getElementsByTagName('td');   
    $row = array();
    $i=0;
    foreach ($cols as $node) {
        # code...

        if($row_headers==NULL)
            $row[] = $node->nodeValue;
        else
            $row[$row_headers[$i]] = $node->firstChild->ownerDocument->saveHTML($node->firstChild);
        $i++;
    }   
    $table[] = $row;
}   

输出将是:

[1] => Array
(
    [Column 1] => <b>Q</b>
    [Column 2] => Desc.
)

【讨论】:

  • 这是示例 html,您的代码是针对这个特定示例的,您能和我聊聊吗,您的解决方案有效,但我的原始 html 中没有
  • 我已经更新了问题,添加了原始表格,我希望您的答案也可以使用它,但它不兼容,请根据它提出解决方案
  • 是的,我的解决方案也可以在这里工作并给你&lt;p class=BodyA&gt;&lt;span lang=EN-US style='font-size:16.0pt;border:none'&gt;&lt;span style='border:none'&gt;Incorrect&lt;/span&gt;&lt;/span&gt; &lt;/p&gt;
  • 它无法正常工作,数组即将为空,我可以在任何桌面上显示您吗
  • 亲爱的我等你
猜你喜欢
  • 2018-01-26
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-05-20
  • 2011-04-05
相关资源
最近更新 更多