【问题标题】:Import data from a complex website via excel vba通过 excel vba 从复杂网站导入数据
【发布时间】:2018-10-28 04:36:13
【问题描述】:

我还是个初学者,但我可以阅读简单的 html 结构。

但是在https://stockrow.com/AAPL/financials/income/annual 网站上,我尝试使用 xmlhttprequest 将数据提取到 excel 中,但源数据缺少包含所有关键数据的重要表格。 当我检查网站时,我可以看到整个 html 结构。

这是我得到的源数据:

<!DOCTYPE html>
<html lang="en">
  <head>
    <link rel="apple-touch-icon-precomposed" sizes="57x57" 
href="/favicons/apple-touch-icon-57x57.png" />
<link rel="apple-touch-icon-precomposed" sizes="114x114" 
href="/favicons/apple-touch-icon-114x114.png" />
<link rel="apple-touch-icon-precomposed" sizes="72x72" 
href="/favicons/apple-touch-icon-72x72.png" />
<link rel="apple-touch-icon-precomposed" sizes="144x144" 
href="/favicons/apple-touch-icon-144x144.png" />
<link rel="apple-touch-icon-precomposed" sizes="60x60" 
href="/favicons/apple-touch-icon-60x60.png" />
<link rel="apple-touch-icon-precomposed" sizes="120x120" 
href="/favicons/apple-touch-icon-120x120.png" />
<link rel="apple-touch-icon-precomposed" sizes="76x76" 
href="/favicons/apple-touch-icon-76x76.png" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" 
href="/favicons/apple-touch-icon-152x152.png" />
<link rel="icon" type="image/png" href="/favicons/favicon-196x196.png" 
sizes="196x196" />
<link rel="icon" type="image/png" href="/favicons/favicon-96x96.png" 
sizes="96x96" />
<link rel="icon" type="image/png" href="/favicons/favicon-32x32.png" 
sizes="32x32" />
<link rel="icon" type="image/png" href="/favicons/favicon-16x16.png" 
sizes="16x16" />
<link rel="icon" type="image/png" href="/favicons/favicon-128.png" 
sizes="128x128" />
<meta name="application-name" content="stockrow.com"/>
<meta name="msapplication-TileColor" content="#FFFFFF" />
<meta name="msapplication-TileImage" content="/favicons/mstile-144x144.png" 
/>

<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />

<link href="https://code.cdn.mozilla.net/fonts/fira.css" rel="stylesheet" type="text/css" />

<script src="https://www.google.com/recaptcha/api.js"></script>

  <script src="https://cdn.ravenjs.com/3.15.0/raven.min.js"></script>
  <script>Raven.config('https://3ce523a8252c436f83c6fc423b340c0a@sentry.io/144901').install()</script>

<meta name="csrf-param" content="authenticity_token" />

<link rel="stylesheet" media="screen" href="/packs/stockrow-aa9c6f09f554179248530de2e33baa9b.css" />
<script src="/packs/stockrow-a35b20c51d525016f7c7.js"></script>
<script async id="_ck_381101" src="https://forms.convertkit.com/381101?v=7"></script>

我不知道如何解决这个问题,所以我想试试堆栈溢出。

【问题讨论】:

  • 网站的内容可能主要由 JavaScript 加载,因此不在初始 HTML 中。阅读网络抓取工具。

标签: html xml excel vba


【解决方案1】:

如果你只需要网站显示的数据,其实可以用VBA打开一个IE实例,让IE帮你抓取数据。这有点骇人听闻,但它会完成这项工作。

基本上,使用浏览器检查网站,看看哪些元素包含您想要的数据。在您的 VBA 脚本中,您可以要求 VBA 收集元素中包含的数据。

【讨论】:

    【解决方案2】:

    仔细检查页面 HTML 会显示您可以下载 xlsx。实际上,您可以简单地复制与元素的 href 关联的 URL,并将其传递给 URLMon 以直接下载。

    片段:

     <a class="button hollow expanded" href="/api/companies/AAPL/financials.xlsx?dimension=MRY&amp;section=Income Statement" target="_blank">Export to Excel (.xlsx)</a>
        

    图片:

    href 是相对的,因此您需要将主机域放在前面。


    VBA:

    Option Explicit
    
    #If VBA7 And Win64 Then
        Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
        Alias "URLDownloadToFileA" ( _
        ByVal pCaller As LongPtr, _
        ByVal szURL As String, _
        ByVal szFileName As String, _
        ByVal dwReserved As LongPtr, _
        ByVal lpfnCB As LongPtr _
        ) As Long
    
    #Else
        Private Declare Function URLDownloadToFile Lib "urlmon" _
                                 Alias "URLDownloadToFileA" ( _
                                 ByVal pCaller As Long, _
                                 ByVal szURL As String, _
                                 ByVal szFileName As String, _
                                 ByVal dwReserved As Long, _
                                 ByVal lpfnCB As Long _
                                 ) As Long
    
    #End If
    
    Public Const BINDF_GETNEWESTVERSION As Long = &H10
    Public Const folderName As String = "C:\Users\HarrisQ\Desktop\info.xlsx" '<=Change as required
    
    Public Sub downloadPDF()
        Dim ret As Long
        ret = URLDownloadToFile(0, "https://stockrow.com/api/companies/AAPL/financials.xlsx?dimension=MRY&amp;section=Income Statement", folderName, BINDF_GETNEWESTVERSION, 0)
    
    End Sub
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-09-30
      • 2020-01-12
      • 1970-01-01
      • 1970-01-01
      • 2016-03-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多