【发布时间】:2019-07-28 04:22:26
【问题描述】:
正如我之前的问题所表明的那样,我一直在尝试从网站上抓取数据。
感谢社区,我能够弄清楚我的问题是什么,但现在我面临另一个问题。
这次我没有收到任何错误,但是程序没有将任何值导出到 excel,我的页面仍然是空白的。
在我抓取的另一个网站上,HTML.Elements 是 divs,现在是 spans,是因为这个吗?
这是我的代码:
Option Explicit
Public Sub Loiça()
Dim data As Object, i As Long, html As HTMLDocument, r As Long, c As Long, item As Object, div As Object
Set html = New HTMLDocument '<== VBE > Tools > References > Microsoft HTML Object Library
Dim IE As New InternetExplorer
Dim numPages As Long
numPages = GetNumberOfPages
With CreateObject("MSXML2.XMLHTTP")
' numResults = arr(UBound(arr))
' numPages = 1
For i = 1 To numPages
If i > 1 Then
.Open "GET", Replace$("https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1", "page=1", "page=" & i), False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End If
Set data = html.getElementsByClassName("snize-title")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("span")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
Next
End With
'----------------------------------------------------------------------------------------------------------------------------------------------------------------------'
End Sub
Public Function GetNumberOfPages() As Long
Dim IE As New InternetExplorer
With IE
.Visible = False
.Navigate2 "https://mediamarkt.pt/pages/search-results-page?q=maquina+roupa&page=1"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim numPages As Long, numResults As Long, arr() As String
arr = Split(.document.querySelector(".snize-search-results-header").innerText, Chr$(32))
numResults = arr(LBound(arr))
GetNumberOfPages = numResults
.Quit
End With
End Function
【问题讨论】:
-
如果您查看页面源代码,
snize-title类在加载时不存在,这就是您没有得到任何结果的原因 -
您无法使用
With CreateObject("MSXML2.XMLHTTP")解决此问题。您尝试访问的站点使用 javascript,您必须构建一个仅使用Dim IE As New InternetExplorer的解决方案。在这个特定页面的情况下,您应该学习如何从 VBA 自动化 IE。 -
@Mig 是的,它在加载后就存在,但是您抓取页面的方式不会加载 js,因此您的抓取工具不存在它
-
给你 - 数据实际上是通过调用 url 加载的:json link
-
你可能只需要抓取 json url 就可以逃脱
标签: html excel vba web-scraping screen-scraping