【问题标题】:Can't fetch the titles from a webpage无法从网页中获取标题
【发布时间】:2019-01-05 20:52:06
【问题描述】:

我已经在 vba 中结合IE 编写了一个脚本,以从网页上获取不同的chartstitles,但我无法做到。看来我已经使用了正确的 class 名称和 tag 名称来访问内容,但没有骰子。它也不会抛出任何错误。

这是我目前的方法:

Sub GetTitle()
    Const Url As String = "https://www.fbatoolkit.com/"
    Dim IE As New InternetExplorer, Html As HTMLDocument
    Dim post As Object

    With IE
        .Visible = True
        .navigate Url
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set Html = .document
    End With

    Application.Wait Now + TimeValue("00:00:05")

    For Each post In Html.getElementsByClassName("chart")
        With post.getElementsByTagName("text")
          If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
        End With
    Next post
End Sub

标题如下,在每个图表上方可见:

Toys & Games
Health & Household

我不希望有任何与 selenium 相关的解决方案。谢谢。

【问题讨论】:

    标签: vba excel web-scraping internet-explorer-11


    【解决方案1】:

    说实话,这有点作弊。将其视为占位符,直到我找到更好的方法,因为我猜您特别想访问 那些 标题。

    Option Explicit
    Public Sub GetInfo()
        Dim ie As New InternetExplorer, html As HTMLDocument, titles(), i As Long
        With ie
            .Visible = True
            .navigate "https://www.fbatoolkit.com/"
            While .Busy Or .readyState < 4: DoEvents: Wend
            Set html = .document
            titles = GetTitles(html.body.innerHTML, "id=""visualization([^""]*)")
            For i = LBound(titles) To UBound(titles)
                Debug.Print titles(i)
            Next
            .Quit '<== Remember to quit application
        End With
    End Sub
    
    Public Function GetTitles(ByVal inputString As String, ByVal sPattern As String) As Variant
        Dim Matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
        With CreateObject("vbscript.regexp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .Pattern = sPattern
            If .test(inputString) Then
                Set Matches = .Execute(inputString)
                For Each iMatch In Matches
                    If iMatch.SubMatches(0) <> vbNullString Then
                        ReDim Preserve arrMatches(i)
                        arrMatches(i) = Replace$(Replace$(iMatch.SubMatches(0), Chr$(95), Chr$(32)), Chr$(32) & Chr$(32), Chr$(32) & Chr$(38) & Chr$(32))
                        i = i + 1
                    End If
                Next iMatch
            End If
        End With
        GetTitles = arrMatches
    End Function
    

    【讨论】:

    • 感谢@QHarr,您的建议。我没有注意到我可以拍拍ids 来获得标题。我怎么错过了!!
    • 好的,现在我们可以删除评论了。如果您愿意,我将在此处粘贴我如何优化的链接。 I + lrow 又一次成功了。
    【解决方案2】:

    虽然这个答案完全受QHarr 的影响,但我想将它发布给未来的读者。使用IDS 是这里的最佳策略。以下解决方案几乎类似于类别名称。

    这里是:

    Sub GetChartInfo()
        Const Url As String = "https://www.fbatoolkit.com/"
        Dim IE As New InternetExplorer, Html As HTMLDocument
        Dim itemvisibility As Object, otitle As Object, I&
    
        With IE
            .Visible = False
            .navigate Url
            While .Busy = True Or .readyState < 4: DoEvents: Wend
            Set Html = .document
        End With
    
        Do: Set itemvisibility = Html.querySelectorAll("div[class='chart-container']"): DoEvents: Loop While itemvisibility.Length <= -1
    
        With Html.querySelectorAll("div[class='chart-container']")
            For I = 0 To .Length - 1
                Do: Set otitle = .Item(I).querySelector(".chart"): DoEvents: Loop While otitle Is Nothing
                Cells(I + 1, 1) = Application.WorksheetFunction.Proper(Replace(Replace(Split(otitle.getAttribute("id"), "visualization_")(1), "__", " "), "_", " "))
            Next I
        End With
    End Sub
    

    【讨论】:

    • 好方法
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-01-05
    • 2020-06-07
    • 2021-08-25
    • 2018-05-31
    • 2019-06-10
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多