【问题标题】:VBA WebScraping returning nothing to excelVBA WebScraping 没有返回 excel
【发布时间】:2019-07-28 04:22:26
【问题描述】:

正如我之前的问题所表明的那样,我一直在尝试从网站上抓取数据。
感谢社区,我能够弄清楚我的问题是什么,但现在我面临另一个问题。
这次我没有收到任何错误,但是程序没有将任何值导出到 excel,我的页面仍然是空白的。
在我抓取的另一个网站上,HTML.Elementsdivs,现在是 spans,是因为这个吗?
这是我的代码:

Option Explicit
Public Sub Loiça()
    Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
    Set html = New HTMLDocument                  '<== VBE > Tools > References > Microsoft HTML Object Library
    
    Dim IE As New InternetExplorer
    Dim numPages As Long
    numPages = GetNumberOfPages

With CreateObject("MSXML2.XMLHTTP")
       ' numResults = arr(UBound(arr))
       ' numPages = 1

        For i = 1 To numPages
             If i > 1 Then
                .Open "GET", Replace$("https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1", "page=1", "page=" & i), False
                .setRequestHeader "User-Agent", "Mozilla/5.0"
                .send
                 html.body.innerHTML = .responseText
            End If
            Set data = html.getElementsByClassName("snize-title")
            For Each item In data
                r = r + 1: c = 1
                For Each div In item.getElementsByTagName("span")
                    With ThisWorkbook.Worksheets("Loiça")
                        .Cells(r, c) = div.innerText
                    End With
                    c = c + 1
                Next
            Next
        Next
    End With
    '----------------------------------------------------------------------------------------------------------------------------------------------------------------------'
End Sub
Public Function GetNumberOfPages() As Long
    Dim IE As New InternetExplorer
    With IE
        .Visible = False
        .Navigate2 "https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim numPages As Long, numResults As Long, arr() As String
        arr = Split(.document.querySelector(".snize-search-results-header").innerText, Chr$(32))
        numResults = arr(LBound(arr))
        GetNumberOfPages = numResults
        .Quit
    End With
End Function

【问题讨论】:

  • 如果您查看页面源代码,snize-title 类在加载时不存在,这就是您没有得到任何结果的原因
  • 您无法使用With CreateObject("MSXML2.XMLHTTP") 解决此问题。您尝试访问的站点使用 javascript,您必须构建一个仅使用 Dim IE As New InternetExplorer 的解决方案。在这个特定页面的情况下,您应该学习如何从 VBA 自动化 IE。
  • @Mig 是的,它在加载后就存在,但是您抓取页面的方式不会加载 js,因此您的抓取工具不存在它
  • 给你 - 数据实际上是通过调用 url 加载的:json link
  • 你可能只需要抓取 json url 就可以逃脱

标签: html excel vba web-scraping screen-scraping


【解决方案1】:

信息是动态加载的。您需要始终使用 IE。另外,更改您的 CSS 选择器

Option Explicit

Public Sub WriterResults()
    Dim IE As New InternetExplorer, i As Long, data As Object, span As Object, item As Object, r As Long, c As Long
    With IE
        .Visible = True
        .Navigate2 "https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim numPages As Long, numResults As Long, arr() As String
        arr = Split(.document.querySelector(".snize-search-results-header").innerText, Chr$(32))
        numResults = arr(LBound(arr))
        Dim resultsPerPage As Long
        resultsPerPage = .document.querySelectorAll(".snize-overhidden").Length
        numPages = Application.RoundUp(numResults / resultsPerPage, 0)
        For i = 1 To numPages
            If i > 1 Then
                .Navigate2 Replace$("https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1", "page=1", "page=" & i)
                While .Busy Or .readyState < 4: DoEvents: Wend
            End If
            Set data = .document.getElementsByClassName("snize-overhidden")
            For Each item In data
                r = r + 1: c = 1
                For Each span In item.getElementsByTagName("span")
                    With ThisWorkbook.Worksheets("Loiça")
                        .Cells(r, c) = span.innerText
                    End With
                    c = c + 1
                Next
            Next
        Next
        .Quit
    End With
End Sub

【讨论】:

  • 哦...所以我总是需要根据页面使用相同的方法?如果我从 IE 开始,那么我也必须用 IE 来完成整个项目吗?非常感谢!
  • 除非有可用的 API,否则可能会出现这种情况
猜你喜欢
  • 2019-07-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-12-07
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-08-27
相关资源
最近更新 更多