【问题标题】:Grab link titles from site [closed]从网站获取链接标题[关闭]
【发布时间】:2014-09-01 23:00:00
【问题描述】:

您好,可以从本站导出为txt文件:

http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html

所有地址?

喜欢:

1BPqtqBKoUjEq8STWmJxhPqtsf3BKp5UyE
1i7cZdoE9NcHSdAL5eGjmTJbBVqeQDwgw
etc...

我写了这段代码:

<?
$html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute. 
    echo $link->getAttribute('href'), '<br>';
}
?>

但它会打印所有链接标题,我只需要地址...

【问题讨论】:

    标签: php html web-scraping domdocument


    【解决方案1】:

    这可以简单得多,只需使用文本操作:

    // get page
    $html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses.html');
    // split on bit just in front of address
    $parts = explode('./bitcoin/address/',$html);
    // dump the first part
    array_shift($parts);
    // get addresses from all subsequent parts
    foreach ($parts as $part) $addresses[] = substr($part,0,strpos($part,'"'));
    // show result
    echo implode('<br>',$addresses);
    

    cmets 解释代码。不过,我承认使用 DOM 有其优雅之处。

    【讨论】:

    • 非常感谢朋友:)
    • 不客气,你也是。但我认为 Ghost 的解决方案也非常好,老实说,它是对您问题的更好回答。我给了它一个^。
    【解决方案2】:

    我要做的是定位每一行,然后定位锚链接。示例:

    $html = file_get_contents('http://bitinfocharts.com/top-100-richest-bitcoin-addresses-5.html');
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    libxml_clear_errors();
    $xpath = new DOMXpath($dom);
    
    $data = array();
    $table_rows = $xpath->query('//h1[contains(text(), "Top 100 Richest Addresses Bitcoin")]/following-sibling::div[2]/table/tr');
    foreach($table_rows as $row) {
        $cell = $xpath->query('./td[2]/a', $row);
        if($cell->length > 0) {
            $data[] = $cell->item(0)->nodeValue;
    
        }
    }
    
    echo '<pre>';
    print_r($data);
    
    //file_put_contents('your_file.txt', implode("\n", $data));
    

    $data 看起来像这样:(部分)

    Array
    (
        [0] => 1KcRjW2roV8dtZoBMPD83nsFburPCY7RfR
        [1] => 1LovisaJ31py5rr37y5xpt3MzSjErpoeLr
        [2] => 1BE1ttHnrJ7YKkLgKpiNrp8uT3kM6Y1xfg
        [3] => 1Czx5RKaDkiE56RwdeLXRYL57ZxxdFxwhb
        [4] => 1BhQDdQgVyAekFZjT1nW8PB5XRt9VJhRs5
        [5] => 1JsSF3YLF4v9Fasfu6pqevwWc5Mtyf76M3
    

    【讨论】:

      猜你喜欢
      • 2011-05-19
      • 1970-01-01
      • 2011-07-28
      • 2016-11-28
      • 2020-08-15
      • 2012-02-16
      • 1970-01-01
      • 2020-06-30
      • 2011-02-03
      相关资源
      最近更新 更多