【问题标题】:How to get specific table data by html agility pack如何通过html敏捷包获取特定的表格数据
【发布时间】:2021-04-26 05:23:10
【问题描述】:

我正在制作一个网络爬虫来提取股票信息并保存到数据库中。我的计划是仅获取公司名称和价格(最新价格、收盘价 YCP 等)并存储为对象。

URL = 查看源代码:https://www.dsebd.org/latest_share_price_scroll_l.php 如果需要,请从 5460 行开始

这里我需要先转义 tr 然后再拉每个 td[3-7]。

<div class="table-responsive inner-scroll">
                                <table class='table table-bordered background-white shares-table fixedHeader'>
                                    <thead>
                                        <tr>
                                            <th width="4%">#</th>
                                            <th width="12%">TRADING CODE</th>
                                            <th width="12%">LTP*</th>
                                            <th width="12%">HIGH</th>
                                            <th width="12%">LOW</th>
                                            <th width="12%">CLOSEP*</th>
                                            <th width="12%">YCP*</th>
                                            <th width="12%">CHANGE</th>
                                            <th width="12%">TRADE</th>
                                            <th width="12%">VALUE (mn)</th>
                                            <th width="12%">VOLUME</th>
                                        </tr>
                                    </thead>
                                    <tbody>
                                                                                <tr>
                                            <td width="4%">1</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1JANATAMF" class='ab1'>
                                                    1JANATAMF                                               </a>
                                            </td>
                                            <td width="10%">6.3</td>
                                            <td width="10%">6.7</td>
                                            <td width="12%">6.3</td>
                                            <td width="11%">6.5</td>
                                            <td width="12%">6.6</td>
                                            <td width="12%" style="color: red">-0.3</td>
                                            <td width="11%">218</td>
                                            <td width="11%">11.593</td>
                                            <td width="11%">1,771,986</td>
                                        </tr>
                                    </tbody>
                                                                            <tr>
                                            <td width="4%">2</td>
                                            <td width="15%">
                                                <a href="displayCompany.php?name=1STPRIMFMF" class='ab1'>
                                                    1STPRIMFMF                                              </a>
                                            </td>
                                            <td width="10%">20.2</td>
                                            <td width="10%">21.9</td>
                                            <td width="12%">20</td>
                                            <td width="11%">20.2</td>
                                            <td width="12%">21.3</td>
                                            <td width="12%" style="color: red">-1.1</td>
                                            <td width="11%">420</td>
                                            <td width="11%">16.914</td>
                                            <td width="11%">815,552</td>
                                        </tr>
                                    </tbody>... More stocks

这是我的代码。

    public Worker(ILogger<Worker> logger, IParseService parseService)
            {
                _logger = logger;
                _parseService = parseService;
                _url = "https://www.dsebd.org/latest_share_price_scroll_l.php";
            }
    
            protected override async Task ExecuteAsync(CancellationToken stoppingToken)
            {
                while (!stoppingToken.IsCancellationRequested)
                {
                    var HtmlDoc = GetHtml(_url);
                    var mainNode = HtmlDoc.DocumentNode.SelectSingleNode("//div[@class='table-responsive inner-scroll']/table[contains(@class, 'table table-bordered background-white shares-table fixedHeader')]").ChildNodes;
    
                

foreach (var nodes in mainNode)
            {
                //Code to get the info
}

感谢您阅读我的问题,非常感谢任何帮助。

【问题讨论】:

    标签: c# web-scraping html-table html-parsing html-agility-pack


    【解决方案1】:
    foreach (HtmlNode node in mainNode.SelectNodes("//tr"))
                    {
                        var latestPrice = node.SelectSingleNode("td[2]") == null ? "" : node.SelectSingleNode("td[2]").InnerText;
                        var highestPrice = node.SelectSingleNode("td[3]") == null ? "" : node.SelectSingleNode("td[3]").InnerText;
                        var closingPrice = node.SelectSingleNode("td[4]") == null ? "" : node.SelectSingleNode("td[4]").InnerText;
                        var yesterdayPrice = node.SelectSingleNode("td[5]") == null ? "" : node.SelectSingleNode("td[5]").InnerText;
                        var change = node.SelectSingleNode("td[6]") == null ? "" : node.SelectSingleNode("td[6]").InnerText;
                        var trade = node.SelectSingleNode("td[7]") == null ? "" : node.SelectSingleNode("td[7]").InnerText;
                        var value = node.SelectSingleNode("td[8]") == null ? "" : node.SelectSingleNode("td[8]").InnerText;
                        var volume = node.SelectSingleNode("td[9]") == null ? "" : node.SelectSingleNode("td[9]").InnerText;
    
                        Regex regex = new Regex(@"^[a - zA - Z]{ 3,}$/"); 
    
                              Match match = regex.Match(latestPrice);
    
                        if (match.Success) { Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}", latestPrice, highestPrice, closingPrice, yesterdayPrice, change, trade, value, volume); }
                        continue;
                        
                    }
    

    【讨论】:

      猜你喜欢
      • 2017-11-21
      • 1970-01-01
      • 1970-01-01
      • 2013-09-08
      • 2014-03-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多