【问题标题】:Macro throws a weird error while parsing content from json宏从 json 解析内容时抛出一个奇怪的错误
【发布时间】:2021-11-18 08:52:14
【问题描述】:

我正在尝试使用 xmlhttp 请求从 webpage 中抓取某些信息。我感兴趣的信息是 javascript 加密和动态加载的。但是,它们在页面源代码中可用 (CTRL + U)。

当我使用正则表达式从页面源中提取该部分并使用JsonConverter 处理相同的部分时,我收到以下错误:

Run-time error `10001`:
Error parsing JSON:
"text":{"payload":{"

我试过了:

Sub GrabRedfinInfo()
    Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
    Dim HTML As HTMLDocument, Http As Object
    Dim jsonObject As Object, jsonStr As Object
    Dim itemStr As Variant, sResp As String

    Set HTML = New HTMLDocument
    Set Http = CreateObject("MSXML2.XMLHTTP")

    With Http
        .Open "Get", siteLink, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"
        .send
        HTML.body.innerHTML = .responseText
        sResp = .responseText
    End With

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "reactServerState\.InitialContext = (.*);"
        .MultiLine = True
        Set jsonStr = .Execute(sResp)
    End With
    
    itemStr = jsonStr(0).submatches(0)
    
    Set jsonObject = JsonConverter.ParseJson(Replace(itemStr, "\", ""))
    MsgBox jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/belowTheFold")("res")
End Sub

预期输出:

Active Under Contract
Active
Pending - Taking Backups
Active

下图显示了他们的行踪:

https://imgur.com/qcksyZ4

【问题讨论】:

  • 以下两行是修复。首先用这个Set jsonObject = JsonConverter.ParseJson(itemStr)替换你现有的行,然后添加("text")得到字符串MsgBox jsonObject("ReactServerAgent.cache")("dataCache")("/stingray/api/home/details/belowTheFold")("res")("text")

标签: json vba web-scraping xmlhttprequest


【解决方案1】:

我会改为将正则表达式更改为更具限制性,并仅针对控制字符串的事件。我还会更改字符串替换,以确保将\"" 进行交换。

然后您将事件时间线作为数组/集合结束。见here


示例:


代码:

Option Explicit

Public Sub GrabRedfinInfo()
    Const siteLink$ = "https://www.redfin.com/TX/Austin/604-Amesbury-Ln-78752/unit-2/home/171045975"
    Dim HTML As HTMLDocument, Http As Object
    Dim jsonObject As Object, jsonStr As Object
    Dim itemStr As Variant, sResp As String

    Set HTML = New HTMLDocument
    Set Http = CreateObject("MSXML2.XMLHTTP")

    With Http
        .Open "Get", siteLink, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36"
        .send
        HTML.body.innerHTML = .responseText
        sResp = .responseText
    End With

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = """events\\"".(\[.*?\])"
        .MultiLine = True
        Set jsonStr = .Execute(sResp)
    End With
    
    itemStr = jsonStr(0).SubMatches(0)
    
    Set jsonObject = JsonConverter.ParseJson(Replace$(itemStr, "\" & Chr$(34), Chr$(34))) 'Array (collection)
    
    Dim evt As Object
    
    For Each evt In jsonObject
        Debug.Print evt("mlsDescription")
    Next

End Sub
  

【讨论】:

  • 您的解决方案运行良好。您能否解释一下为什么当我以自己的方式尝试时会出现错误?谢谢。
  • 我假设您的替换实际上并没有修复 json。可能您正在通过替换引入不同的失效。在稍后要解析为 json 的字符串中正确转义是很棘手的。
猜你喜欢
  • 2012-04-22
  • 1970-01-01
  • 1970-01-01
  • 2017-07-15
  • 2019-06-28
  • 2016-06-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多