【发布时间】:2021-11-18 08:52:14
【问题描述】:
我正在尝试使用 xmlhttp 请求从 webpage 中抓取某些信息。我感兴趣的信息是 javascript 加密和动态加载的。但是,它们在页面源代码中可用 (CTRL + U)。
当我使用正则表达式从页面源中提取该部分并使用JsonConverter 处理相同的部分时,我收到以下错误:
Run-time error `10001`:
Error parsing JSON:
"text":{"payload":{"
我试过了:
Sub GrabRedfinInfo()
Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
Dim HTML As HTMLDocument, Http As Object
Dim jsonObject As Object, jsonStr As Object
Dim itemStr As Variant, sResp As String
Set HTML = New HTMLDocument
Set Http = CreateObject("MSXML2.XMLHTTP")
With Http
.Open "Get", siteLink, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"
.send
HTML.body.innerHTML = .responseText
sResp = .responseText
End With
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "reactServerState\.InitialContext = (.*);"
.MultiLine = True
Set jsonStr = .Execute(sResp)
End With
itemStr = jsonStr(0).submatches(0)
Set jsonObject = JsonConverter.ParseJson(Replace(itemStr, "\", ""))
MsgBox jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/belowTheFold")("res")
End Sub
预期输出:
Active Under Contract
Active
Pending - Taking Backups
Active
下图显示了他们的行踪:
【问题讨论】:
-
以下两行是修复。首先用这个
Set jsonObject = JsonConverter.ParseJson(itemStr)替换你现有的行,然后添加("text")得到字符串MsgBox jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/belowTheFold")("res")("text")
标签: json vba web-scraping xmlhttprequest