【问题标题】:WinHttpRequest gzip response parsingWinHttpRequest gzip 响应解析
【发布时间】:2017-05-26 04:13:20
【问题描述】:

我在我的 VBA 项目中使用 MSXML2.XMLHTTP60 进行 http 冲浪。问题是MSXML2.XMLHTTP60 仅限于四个并发请求。

我正在尝试改用WinHttp.WinHttpRequest.5.1,但还有另一个问题。 MSXML2.XMLHTTP60 自动解析 gzip 结果,但是 WinHttpRequest.responseText 方法失败并出现错误:

目标多字节代码页中不存在 unicode 字符的映射。

如何使用标准 Windows 库解析此结果?

代码示例:
MSXML2.XMLHTTP60 限制:

Public req1 As MSXML2.XMLHTTP60
Public req2 As MSXML2.XMLHTTP60
Public req3 As MSXML2.XMLHTTP60
Public req4 As MSXML2.XMLHTTP60
Public req5 As MSXML2.XMLHTTP60

Private Const url As String = "http://speedtest.tele2.net/100MB.zip"


Public Sub ConcurrentIssue()
    Set req1 = New MSXML2.XMLHTTP60
    req1.Open "get", url, True

    Set req2 = New MSXML2.XMLHTTP60
    req2.Open "get", url, True

    Set req3 = New MSXML2.XMLHTTP60
    req3.Open "get", url, True

    Set req4 = New MSXML2.XMLHTTP60
    req4.Open "get", url, True

    Set req5 = New MSXML2.XMLHTTP60
    req5.Open "get", url, True

    req1.send
    req2.send
    req3.send
    req4.send

    'This query will be wait
    req5.send

End Sub

问题是WinHttp.WinHttpRequest.5.1不支持解压(证明链接:https://msdn.microsoft.com/ru-ru/library/windows/desktop/hh227298(v=vs.85).aspx)。
我需要自己解压响应。

解压问题示例:

Public Sub DecompressOk()
    Set req1 = New MSXML2.XMLHTTP60
    req1.Open "get", "http://www.google.ru", False
    req1.setRequestHeader "User-Agent", "Fiddler"
    req1.setRequestHeader "Accept-Encoding", "gzip, deflate"
    req1.send

    Debug.Print req1.responseText
End Sub

Public Sub WithoutDecompress()
    Dim req As WinHttp.WinHttpRequest
    Set req = New WinHttp.WinHttpRequest

    req.Open "get", "http://www.google.ru", False
    req.setRequestHeader "User-Agent", "Fiddler"
    req.setRequestHeader "Accept-Encoding", "gzip, deflate"
    req.send

    Debug.Print req.responseText
End Sub

我试图做这个技巧但没有成功:

Public Sub DecompressIssue()
    Dim req As WinHttp.WinHttpRequest
    Set req = New WinHttp.WinHttpRequest

    req.Open "get", "http://www.google.ru", False
    req.setRequestHeader "User-Agent", "Fiddler"
    req.setRequestHeader "Accept-Encoding", "gzip, deflate"
    req.send

    SaveBinaryToFile req.responseBody, "C:\test.zip"

    Dim xmlReq As MSXML2.XMLHTTP60
    Set xmlReq = New MSXML2.XMLHTTP60

    xmlReq.Open "get", "C:\test.zip", False
    xmlReq.setRequestHeader "Accept-Encoding", "gzip, deflate"
    xmlReq.setRequestHeader "Content-Type", "text/html; charset=windows-1251"
    xmlReq.send

    Debug.Print xmlReq.responseBody
End Sub

Sub SaveBinaryToFile(arrBytes() As Byte, strPath As String)
    With CreateObject("ADODB.Stream")
        .Type = 1 ' adTypeBinary
        .Open
        .Write arrBytes
        .SaveToFile strPath, 2 ' adSaveCreateOverWrite
        .Close
    End With
End Sub

【问题讨论】:

  • 请编辑并添加最小代码。问题应该是 complete,以便其他人可以重现 4 个并发请求 MSXML2.XMLHTTP60 限制和 WinHttp.WinHttpRequest.5.1 的“无映射”错误的问题。
  • 某些网页可能包含不可读的字符,尝试从WinHttpRequest.responseBody获取二进制内容,然后通过ADODB.Sream将其转换为文本,如this answer
  • 尝试使用.setRequestHeader "Accept-Encoding", "identity" 进行XHR 以强制网络服务器发送未压缩的响应。
  • 我试图这样做,但服务器冻结。我无法影响它。

标签: vba web-scraping xmlhttprequest winhttp


【解决方案1】:

这个答案证实了the comment made by omegastripes

.setRequestHeader "Accept-Encoding", "identity" 是正确答案!

我已经寻找了几天来解码 gzip 响应的方法,但不知道我们可以告诉服务器不要压缩响应。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-03-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-10-29
    相关资源
    最近更新 更多