【发布时间】:2014-11-15 15:48:56
【问题描述】:
这是一个简单有效的 HTML 表格 (live demo):
<!DOCTYPE html>
<html id="world_presidents" xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta charset='utf-8' />
<style>
table, td, th {
border: 1px solid grey;
}
</style>
</head>
<body>
<table>
<tr>
<th>Name</th>
<th>Took office</th>
<th>Resources</th>
</tr>
<tr>
<td><a href="http://en.wikipedia.org/wiki/George_Washington">George Washington</a></td>
<td>1789</td>
<td><a href="http://books.google.co.uk/books?id=t1pQ4YG-TDIC&pg=PA148&dq=#v=onepage&q=&f=false" title="encyclopedia">Encyclopedia</a>(<a href="http://constitutioncenter.org/media/audio/ron_chernow_10-18-10_(64).mp3" title="Subresource">Sub Resource</a>)<br>
<a href="http://en.wikipedia.org/wiki/George_Washington#CITEREFParry1991" title="parry">Parry 1991</a></td>
</tr>
<tr>
<td>John Adams</td>
<td>1797</td>
<td><a href="http://www.adherents.com/people/pa/John_Adams.html" title="Adherents.com">Adherents.com</a></td>
</tr>
<tr>
<td>Thomas Jefferson</td>
<td>1801</td>
<td><a href="http://books.google.co.uk/books?id=qkTPAAAAMAAJ&redir_esc=y" title="Government Printing Office">Government Printing Office</a></td>
</tr>
<tr>
<td>James Madison</td>
<td>1809</td>
<td><a href="http://www.loa.org/volume.jsp?RequestID=16§ion=toc" title="Library of America">Library of America</a><br>
<a href="http://quod.lib.umich.edu/cgi/t/text/text-idx?c=acls;cc=acls;view=toc;idno=HEB00509.0001.001" title="Federal Republic">Federal Republic</a></td>
</tr>
<tr>
<td>James Monroe</td>
<td>1817</td>
<td><a href="https://rads.stackoverflow.com/amzn/click/com/0813912660" rel="nofollow noreferrer" title="scholarly biography">scholarly biography</a></td>
</tr>
<tr>
<td>John Quincy Adams</td>
<td>1825</td>
<td><a href="http://www.common-place.org/vol-09/no-01/adams/" title="Common-Place">Common-Place</a> (<a href="http://dx.doi.org/10.1111%2F1467-7709.00049" title="Diplomatic History">Aditional - Diplomatic History</a>)</td>
</tr>
<tr>
<td>Andrew Jackson</td>
<td>1829</td>
<td><a href="http://statelibrary.dcr.state.nc.us/nc/bio/public/jackson.htm" title="Information Services Branch">Information Services Branch</a> (<a href="http://www.discovernorthernireland.com/product.aspx?ProductID=2801" title="Tourist Board">Tourist Board</a>)</td>
</tr>
<tr>
<td>Martin Van Buren</td>
<td>1837</td>
<td><a href="http://en.wikipedia.org/wiki/Holmes_Alexander" title="The American Talleyrand">The American Talleyrand</a></td>
</tr>
</table>
</body>
</html>
我正在尝试从列中获取所有数据,以便将其插入数据库。当我试图从“href”属性中获取所有链接时,我被困在第三列。我在下面创建了适用于第一列和第二列的代码,但我找不到更改它的方法,因此它将显示第三列中的所有链接。
<?php
require_once 'simple_html_dom.php';
$html = new simple_html_dom();
$html = file_get_html('table.html');
//engine
//go through table and find href attributes
echo"<p>Presidents</p>";
foreach($html->find('//*[@id="world_presidents"]/body/table/tbody/tr/') as $row) {
$presidentsLink = $row->find('a', 2);
if(!empty($presidentsLink)){
echo $presidentsLink->href . "<br>";
}
}
?>
现在它只显示一个链接而不是 13 个 (live demo)。
简单来说:
- 我正在使用 Simple HTML DOM Parser 从 html 表中获取内容
- 我无法更改 thml 表
- 我的问题是从第三列获取所有 href 属性并显示它们
如果有任何帮助,我将不胜感激。
【问题讨论】:
标签: php html web-scraping simple-html-dom