【发布时间】:2017-05-26 04:13:20
【问题描述】:
我在我的 VBA 项目中使用 MSXML2.XMLHTTP60 进行 http 冲浪。问题是MSXML2.XMLHTTP60 仅限于四个并发请求。
我正在尝试改用WinHttp.WinHttpRequest.5.1,但还有另一个问题。 MSXML2.XMLHTTP60 自动解析 gzip 结果,但是
WinHttpRequest.responseText 方法失败并出现错误:
目标多字节代码页中不存在 unicode 字符的映射。
如何使用标准 Windows 库解析此结果?
代码示例:
MSXML2.XMLHTTP60 限制:
Public req1 As MSXML2.XMLHTTP60
Public req2 As MSXML2.XMLHTTP60
Public req3 As MSXML2.XMLHTTP60
Public req4 As MSXML2.XMLHTTP60
Public req5 As MSXML2.XMLHTTP60
Private Const url As String = "http://speedtest.tele2.net/100MB.zip"
Public Sub ConcurrentIssue()
Set req1 = New MSXML2.XMLHTTP60
req1.Open "get", url, True
Set req2 = New MSXML2.XMLHTTP60
req2.Open "get", url, True
Set req3 = New MSXML2.XMLHTTP60
req3.Open "get", url, True
Set req4 = New MSXML2.XMLHTTP60
req4.Open "get", url, True
Set req5 = New MSXML2.XMLHTTP60
req5.Open "get", url, True
req1.send
req2.send
req3.send
req4.send
'This query will be wait
req5.send
End Sub
问题是WinHttp.WinHttpRequest.5.1不支持解压(证明链接:https://msdn.microsoft.com/ru-ru/library/windows/desktop/hh227298(v=vs.85).aspx)。
我需要自己解压响应。
解压问题示例:
Public Sub DecompressOk()
Set req1 = New MSXML2.XMLHTTP60
req1.Open "get", "http://www.google.ru", False
req1.setRequestHeader "User-Agent", "Fiddler"
req1.setRequestHeader "Accept-Encoding", "gzip, deflate"
req1.send
Debug.Print req1.responseText
End Sub
Public Sub WithoutDecompress()
Dim req As WinHttp.WinHttpRequest
Set req = New WinHttp.WinHttpRequest
req.Open "get", "http://www.google.ru", False
req.setRequestHeader "User-Agent", "Fiddler"
req.setRequestHeader "Accept-Encoding", "gzip, deflate"
req.send
Debug.Print req.responseText
End Sub
我试图做这个技巧但没有成功:
Public Sub DecompressIssue()
Dim req As WinHttp.WinHttpRequest
Set req = New WinHttp.WinHttpRequest
req.Open "get", "http://www.google.ru", False
req.setRequestHeader "User-Agent", "Fiddler"
req.setRequestHeader "Accept-Encoding", "gzip, deflate"
req.send
SaveBinaryToFile req.responseBody, "C:\test.zip"
Dim xmlReq As MSXML2.XMLHTTP60
Set xmlReq = New MSXML2.XMLHTTP60
xmlReq.Open "get", "C:\test.zip", False
xmlReq.setRequestHeader "Accept-Encoding", "gzip, deflate"
xmlReq.setRequestHeader "Content-Type", "text/html; charset=windows-1251"
xmlReq.send
Debug.Print xmlReq.responseBody
End Sub
Sub SaveBinaryToFile(arrBytes() As Byte, strPath As String)
With CreateObject("ADODB.Stream")
.Type = 1 ' adTypeBinary
.Open
.Write arrBytes
.SaveToFile strPath, 2 ' adSaveCreateOverWrite
.Close
End With
End Sub
【问题讨论】:
-
请编辑并添加最小代码。问题应该是 complete,以便其他人可以重现 4 个并发请求
MSXML2.XMLHTTP60限制和WinHttp.WinHttpRequest.5.1的“无映射”错误的问题。 -
某些网页可能包含不可读的字符,尝试从
WinHttpRequest.responseBody获取二进制内容,然后通过ADODB.Sream将其转换为文本,如this answer。 -
尝试使用
.setRequestHeader "Accept-Encoding", "identity"进行XHR 以强制网络服务器发送未压缩的响应。 -
我试图这样做,但服务器冻结。我无法影响它。
标签: vba web-scraping xmlhttprequest winhttp