【问题标题】:Parsing a web page and extracting data解析网页并提取数据
【发布时间】:2014-11-22 17:32:56
【问题描述】:

我的任务是创建一个网络抓取工具(或屏幕抓取工具,但你想看看它)。我找到了 HtmlAgilityPack 但我想知道,鉴于以下 HTML 示例,我将如何提取电话号码等内容

<td valign="top" class="clsContent" style="width: 250px; padding-right: 21px">
    <span class=clsLabelB>Web: </span><a href='http://www.marriott.com/hotels/travel/sandm-san-diego-marriott-del-mar/' target=_blank>http://www.marriott.com/hotels/travel/sandm-san-diego-marriott-del-mar/</a><br />
    <div style='padding-top:7px'>
        <table cellpadding=0 cellspacing=0>
            <tr>
                <td valign=top class=clsLabelB nowrap>Phone:&nbsp;&nbsp;</td>
                <td valign=top>+1 858-523-1700</td>
            </tr>
            <tr>
                <td valign=top class=clsLabelB nowrap>Fax:&nbsp;&nbsp;</td>
                <td valign=top>+1 858-523-1355</td>
            </tr>
            <tr>
                <td valign=top class=clsLabelB nowrap>Toll Free:&nbsp;&nbsp;</td><td valign=top>800-228-9290</td>
            </tr>
        </table>
    </div>
    <p><span class=clsLabelB>Chain: </span><a href='/Hotels/Companies/Marriott-International'>Marriott International</a><br />
    <span class=clsLabelB>Chain Website: </span><a href='http://www.marriott.com' target=_blank>http://www.marriott.com</a>
    <p><span class=clsLabelB>Description: </span>Contemporary high-rise hotel - Convenient to area companies, beaches, golf, shopping, San Diego Zoo and Sea World.<br />
    <div style='padding-top:7px'>
        <table cellpadding=0 cellspacing=0>
            <tr>
                <td valign=top class=clsLabelB width=170px nowrap>Year Renovated:&nbsp;&nbsp;</td>
                <td valign=top>2003</td>
            </tr>
        </table>
    </div>
    <div style='padding-top:7px'>
        <table cellpadding=0 cellspacing=0>
            <tr>
                <td valign=top class=clsLabelB width=170px nowrap>Check in Time:&nbsp;&nbsp;</td>
                <td valign=top>4:00 PM</td>
            </tr>
            <tr>
                <td valign=top class=clsLabelB width=170px nowrap>Check out Time:&nbsp;&nbsp;</td>  
                <td valign=top>12:00 PM</td>
            </tr>
            <tr>
                <td valign=top class=clsLabelB width=170px nowrap>Number of Floors:&nbsp;&nbsp;</td>
                <td valign=top>11</td>
            </tr>
            <tr>
                <td valign=top class=clsLabelB width=170px nowrap>Total Number of Rooms:&nbsp;&nbsp;</td>
                <td valign=top>284</td>
            </tr>
        </table>
    </div>
</td>

目前我没有可显示的示例代码,因为我完全坚持这个,任何帮助或帮助将不胜感激。

【问题讨论】:

    标签: c# web web-scraping


    【解决方案1】:

    你这样试试,这是一个示例代码

        HtmlDocument doc = new HtmlDocument();
        doc.Load("file.html");
        string phone_number = doc.DocumentElement.SelectNodes("//td[contains(text(), 'Phone')]//following-sibling::td[1]"]).InnerText
    

    【讨论】:

    • 谢谢@Tasawer,正是我想要的。
    猜你喜欢
    • 1970-01-01
    • 2020-08-05
    • 2017-05-18
    • 1970-01-01
    • 2013-08-06
    • 2010-09-11
    • 1970-01-01
    • 2017-03-24
    • 1970-01-01
    相关资源
    最近更新 更多