【问题标题】:VBA dynamic webpage scrape ExcelVBA动态网页抓取Excel
【发布时间】:2018-05-31 08:57:33
【问题描述】:

我有一个关于如何从这个网页上抓取数据的问题:

http://tvc4.forexpros.com/init.php?family_prefix=tvc4&carrier=64694b96ed4909e815f1d10605ae4e83&time=1513525898&domain_ID=70&lang_ID=70&timezone_ID=31&pair_ID=171&interval=86400&refresh=4&session=session&client=1&user=200743128&width=650&height=750&init_page=instrument&m_pids=&watchlist=&site=https://au.investing.com&version=1.11.2

它似乎被保存在 iframe 中,并且屏幕上出现了一堆 javascript。

当我尝试收集保存在 iframe 下的 span 或 div 或 tr 标签中的元素时,我似乎无法收集其中的数据。

我的目标是包含在 class="pane-legend-item-value pane-legend-line main" 元素内的内部文本。

显然,内文会根据光标在特定时间在屏幕上的位置而改变。所以我试图做的是设置一个已经加载页面并且光标位于正确位置的 IE,在图表的末尾(给我最后一个数据点),然后你可以将光标移出屏幕,然后我写了一些简单的代码来抓取那个IE窗口然后尝试GetElements,此时我无法获取任何数据。

到目前为止,这是我的代码,它非常粗糙,因为我一直在尝试编辑,因为我阅读了更多选项,但没有任何胜利:( ...任何想法或帮助将不胜感激!(屏幕截图是也在底部)

Sub InvestingCom()

    Dim IE As InternetExplorer
    Dim htmldoc As MSHTML.IHTMLDocument 'Document object
    Dim eleColth As MSHTML.IHTMLElementCollection 'Element collection for th tags
    Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
    Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
    Dim eleRow As MSHTML.IHTMLElement 'Row elements
    Dim eleCol As MSHTML.IHTMLElement 'Column elements
    Dim elehr As MSHTML.IHTMLElement 'Header Element
    Dim iframeDoc As MSHTML.HTMLDocument
    Dim frame As HTMLIFrame
    Dim ieURL As String 'URL

    'Take Control of Open IE
    marker = 0
    Set objShell = CreateObject("Shell.Application")
    IE_count = objShell.Windows.Count
    For x = 0 To (IE_count - 1)
        On Error Resume Next
        my_url = objShell.Windows(x).document.Location
        my_title = objShell.Windows(x).document.Title

        If my_title Like "*" & "*" Then 'compare to find if the desired web page is already open
            Set IE = objShell.Windows(x)
            marker = 1
            Exit For
        Else
        End If
    Next

    'Extract data
    Set htmldoc = IE.document 'Document webpage

    ' I have tried span, tr, td etc tags and various other options
    ' I have never actually tried collecting an HTMLFrame but googled it however was unsuccessful
End Sub

excel 可以找到并与之对话的现有 IE 的屏幕截图,其中 excel 和 VB 在另一个屏幕上打开,以及我要抓取的数据

【问题讨论】:

    标签: javascript excel vba iframe web-scraping


    【解决方案1】:

    我真的很难从那个页面处理两个嵌套的iframes 来收集所需的内容。但无论如何,我终于修好了。运行以下代码并获取您请求的内容:

    Sub forexpros()
        Dim IE As New InternetExplorer, html As HTMLDocument
        Dim frm As Object, frmano As Object, post As Object
    
        With IE
            .Visible = True
            .navigate "http://tvc4.forexpros.com/init.php?family_prefix=tvc4&carrier=64694b96ed4909e815f1d10605ae4e83&time=1513525898&domain_ID=70&lang_ID=70&timezone_ID=31&pair_ID=171&interval=86400&refresh=4&session=session&client=1&user=200743128&width=650&height=750&init_page=instrument&m_pids=&watchlist=&site=https://au.investing.com&version=1.11.2"
            Do Until .readyState = READYSTATE_COMPLETE: Loop
            Application.Wait (Now + TimeValue("0:00:05"))
            Set frm = .document.getElementsByClassName("abs") ''this is the first iframe
            .navigate frm(0).src
            Do Until .readyState = READYSTATE_COMPLETE: Loop
            Application.Wait (Now + TimeValue("0:00:05"))
            Set html = .document
        End With
    
        Set frmano = html.getElementsByTagName("iframe")(0).contentWindow.document  ''this is the second iframe
    
        For Each post In frmano.getElementsByClassName("pane-legend-item-value pane-legend-line main")
            Debug.Print post.innerText
        Next post
        IE.Quit
    End Sub
    

    【讨论】:

      猜你喜欢
      • 2014-11-25
      • 1970-01-01
      • 2020-12-05
      • 1970-01-01
      • 2021-12-20
      • 2017-05-10
      • 1970-01-01
      • 2013-08-27
      • 2019-04-03
      相关资源
      最近更新 更多