【问题标题】:scraping multiple page web table with vba使用 vba 抓取多页网页表
【发布时间】:2020-02-18 20:53:32
【问题描述】:

我正在尝试使用 vba 将表格的第二页保存在 excel 中,但我无法使用 click 属性,请问您可以帮帮我吗?我在网上到处搜索,没有结果。谢谢。

Sub BrowseSiteTableObjectX()
    Dim IE As New SHDocVw.InternetExplorer
    Dim Docm As MSHTML.HTMLDocument
    Dim HTMLAtab As MSHTML.IHTMLElement
    Dim HTMLArow As MSHTML.IHTMLElement
    Dim iRow As Long

    With IE
        .navigate "https://www.nasdaq.com/market-activity/stocks/screener"
        Do While .Busy Or .readyState <> 4
           DoEvents
        Loop
    End With

    Set Docm = IE.document

    Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0).Click

    Set Docm = IE.document

    Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)

    For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
        iRow = iRow + 1
        Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
        Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
        DoEvents
    Next HTMLArow

    IE.Quit
    Set IE = Nothing
    Set Docm = Nothing
End Sub

【问题讨论】:

  • 换句话说,"Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0).Click" 行不要转到第二页表。
  • 请删除java和javascript标签。

标签: html excel vba


【解决方案1】:

看看这是否有帮助,在这篇文章中,我概述了处理页面上元素的多种方法。您需要考虑“单击”下一步按钮时发生的情况。可能是提交表单,运行页面 javascript,

Excel VBA Submitting data via IE on an online MS Forms not working

看看这些想法是否有帮助。您可能会发现 ExecScript 是最好的,因为 Next 按钮可能会链接回页面上的脚本以加载下一组数据。只需要查看您的 Chrome 开发工具,看看会发生什么。

祝你好运!

【讨论】:

  • 也看看这里,下一个 li,包含一个链接元素。因此,您可能需要使用 elementsbyclassname('next') 获取 Li,然后引用另一个 get 元素 ('a') 并将其循环以获取 htmlanchor 元素,然后执行该操作。
【解决方案2】:

分页有时是一件棘手的事情,但在这个页面上很容易。我还修复了一些其他问题。请阅读代码中的cmets:

Sub BrowseSiteTableObjectX()
  Dim IE As New SHDocVw.InternetExplorer
  Dim Docm As MSHTML.HTMLDocument
  Dim HTMLAtab As MSHTML.IHTMLElement
  Dim HTMLArow As MSHTML.IHTMLElement
  Dim nodePagiantionNext As Object 'I do those things always by late binding
  Dim iRow As Long
  Dim lastPage As Boolean

  With IE
    'Set the following line to 'False' to make IE invisible
    'You can also set IE to full screen, scroll to the page
    'count and watch it advance. I give each page 5 seconds
    'to load. From what I have seen, this is partly necessary
    .Visible = True
    .navigate "https://www.nasdaq.com/market-activity/stocks/screener"
    Do While .Busy Or .readyState <> 4: DoEvents: Loop
  End With
  'The page loads data after the IE says he's ready. So you need a manual break for a few seconds
  'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
  Application.Wait (Now + TimeSerial(0, 0, 5))

  Set Docm = IE.document

  'You need a loop to go through all pages
  '(The IE is a diva. It can be you must start him every loop round. But for the given url it
  'works for 312 pages with the 5 second break)
  Do
    'If you click the 'next' link here, you are on the second page before you read out any data
    'You must do the click after reading data from the first page
    '
    'Give some seconds after the click to load the new page
    Application.Wait (Now + TimeSerial(0, 0, 5))

    Set Docm = IE.document

    Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)

    For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
      iRow = iRow + 1
      Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
      Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
      'DoEvents 'Why?
    Next HTMLArow

    'You can't click the li tag. You must click the link which is the first child of the li tag
    'But you must also know when the last page is reached. Thats  when the CSS class changes to "next disabled"
    Set nodePagiantionNext = Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0)
    '
    'Check if the CSS class has been changed to "disabled".
    'Short explanation, because we ask for "next" first, and if this should work,
    '"next" must also match "next disabled". This is true.  "next" is the first
    'part of "next disabled". All CSS class names with the same beginning fit for
    'a node collection to be created
    If nodePagiantionNext.getAttribute("class") = "next disabled" Then
      'If last page end loop
      lastPage = True
    Else
      'If not the last page, click for next page
      nodePagiantionNext.FirstChild.Click
    End If
  Loop Until lastPage

  IE.Quit
  Set IE = Nothing
  Set Docm = Nothing
End Sub

【讨论】:

    猜你喜欢
    相关资源
    最近更新 更多
    热门标签