【发布时间】:2020-07-08 18:19:32
【问题描述】:
我正在尝试抓取我的雇主网站以从他们的博客文章中大量提取图像。我已经开始使用 VBA 在 Excel 中创建一个抓取工具。
(我们无权访问 SQL 数据库)
我已经设置了一个工作表,其中包含 A 列中的帖子标识符列表和 B 列中帖子的 URL。
到目前为止,我的 VBA 脚本通过 B 列中的 URL 列表运行,使用 getElementById 从页面上的标签中提取 HTML,并将结果输出作为字符串粘贴到 C 列中。
我现在正试图弄清楚如何从结果输出中的每个图像中提取 src 属性并将其粘贴到相关列中。我一辈子都想不出一个简单的解决方案。我对 RegEx 不是很熟悉,并且正在努力使用 Excel 的内置字符串函数。
最终的游戏是让宏通过每个图像 URL 运行并将图像以“{Event No.}-{Image Number}”.jpg 之类的文件名格式保存到磁盘
任何帮助将不胜感激。
Sub Get_Image_SRC()
Dim sht As Worksheet
Dim LastRow As Long
Dim i As Integer
Dim url As String
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set sht = ThisWorkbook.Worksheets("Sheet1")
'Ctrl + Shift + End
LastRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
For i = 2 To LastRow
url = Cells(i, "C").Value
MsgBox (url)
IE.navigate url
Application.StatusBar = url & " is loading..."
Do While IE.readyState = 4: DoEvents: Loop
Do Until IE.readyState = 4: DoEvents: Loop
Application.StatusBar = url & " Loaded"
If Cells(i, "B").Value = "WEBNEWS" Then
Cells(i, "D").Value = IE.document.getElementById("NewsDetail").outerHTML
Else
Cells(i, "D").Value = IE.document.getElementById("ReviewContainer").outerHTML
End If
Next i
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
End Sub
生成的 HTML 示例:
<div id=""NewsDetail""><div class=""NewsDetailTitle"">Video: Race Face Behind the Scenes Tour</div><div class=""NewsDetailImage""><img alt=""HeadlinesThumbnail.jpg"" src=""/ImageHandler/6190/515/1000/0/""></div> <div class=""NewsDetailBody"">Pinkbike posted this video a while ago, if you missed it, its' definitely worth a watch.
Ken from Camp of Champions took a look at their New Westminster factory last year which gives a look at the production, people and culture of Race Face. The staff at Race Face are truly their greatest asset they had, best wishes to everyone!
<p><center><object width=""500"" height=""281""><param name=""allowFullScreen"" value=""true""><param name=""AllowScriptAccess"" value=""always""><param name=""movie"" value=""http://www.pinkbike.com/v/188244""><embed width=""500"" height=""281"" src=""http://www.pinkbike.com/v/188244"" type=""application/x-shockwave-flash"" allowscriptaccess=""always"" allowfullscreen=""true""></object></center><p></p>
</div><div class=""NewsDate"">Published Friday, 25 November 2011</div></div>"
【问题讨论】:
标签: excel vba web-scraping