从网站获取链接标题[关闭]答案

【问题标题】：Grab link titles from site [closed]从网站获取链接标题[关闭]
【发布时间】：2014-09-01 23:00:00
【问题描述】：

您好，可以从本站导出为txt文件：

http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html

所有地址？

喜欢：

1BPqtqBKoUjEq8STWmJxhPqtsf3BKp5UyE
1i7cZdoE9NcHSdAL5eGjmTJbBVqeQDwgw
etc...

我写了这段代码：

<?
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute. 
    echo $link->getAttribute('href'), '<br>';
}
?>

但它会打印所有链接标题，我只需要地址...

【问题讨论】：

标签： php html web-scraping domdocument

【解决方案1】：

这可以简单得多，只需使用文本操作：

// get page
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html');
// split on bit just in front of address
$parts = explode('./bitcoin/address/',$html);
// dump the first part
array_shift($parts);
// get addresses from all subsequent parts
foreach ($parts as $part) $addresses[] = substr($part,0,strpos($part,'"'));
// show result
echo implode('<br>',$addresses);

cmets 解释代码。不过，我承认使用 DOM 有其优雅之处。

【讨论】：

非常感谢朋友:)
不客气，你也是。但我认为 Ghost 的解决方案也非常好，老实说，它是对您问题的更好回答。我给了它一个^。

【解决方案2】：

我要做的是定位每一行，然后定位锚链接。示例：

$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXpath($dom);

$data = array();
$table_rows = $xpath->query('//h1[contains(text(), "Top 100 Richest Addresses Bitcoin")]/following-sibling::div[2]/table/tr');
foreach($table_rows as $row) {
    $cell = $xpath->query('./td[2]/a', $row);
    if($cell->length > 0) {
        $data[] = $cell->item(0)->nodeValue;

    }
}

echo '<pre>';
print_r($data);

//file_put_contents('your_file.txt', implode("\n", $data));

$data 看起来像这样：（部分）

Array
(
    [0] => 1KcRjW2roV8dtZoBMPD83nsFburPCY7RfR
    [1] => 1LovisaJ31py5rr37y5xpt3MzSjErpoeLr
    [2] => 1BE1ttHnrJ7YKkLgKpiNrp8uT3kM6Y1xfg
    [3] => 1Czx5RKaDkiE56RwdeLXRYL57ZxxdFxwhb
    [4] => 1BhQDdQgVyAekFZjT1nW8PB5XRt9VJhRs5
    [5] => 1JsSF3YLF4v9Fasfu6pqevwWc5Mtyf76M3

【讨论】：