【问题标题】:Get headers of table when using XMLHTTP approach使用 XMLHTTP 方法时获取表头
【发布时间】:2020-08-14 16:25:54
【问题描述】:

我有一个从这个 url 抓取表格的代码

https://www.reuters.com/companies/AAPL.OQ/financials/income-statement-annual

代码没问题,除了一点之外没有任何问题。代码获取表格但没有获取表头

    With http
    .Open "Get", sURL, False
    .send
    html.body.innerHTML = .responseText
End With

   Set tbl = html.getElementsByTagName("Table")(0)

        For Each rw In tbl.Rows
            r = r + 1: c = 1
            For Each cl In rw.Cells
                ws.Cells(r, c).Value = cl.innerText
                c = c + 1
            Next cl
    Next rw

在检查 URL 时,我发现 API URL 支持

https://www.reuters.com/companies/api/getFetchCompanyFinancials/AAPL.OQ

如何从 JSON 响应中提取“收入”所需的“年度”数据?

我试图参考我想要的部分,但出错了

Const strUrl As String = "https://www.reuters.com/companies/api/getFetchCompanyFinancials/AAPL.OQ"

Sub Test()
Dim a, json As Object, colData As Collection, sFile As String, i As Long

With CreateObject("MSXML2.ServerXMLHTTP.6.0")
    .Open "GET", strUrl
    .send
    Set json = JSONConverter.ParseJson(.responseText)
End With


Set colData = json("market_data")("financial_statements")

Stop
End Sub

【问题讨论】:

  • 错误是什么?有许多收入线 - 您需要哪些?
  • 它是嵌套字典,路径为 json►market_data►financial_statements►income►annual
  • 事实上,我迷失了那些嵌套字典。
  • 您遇到了什么错误?您上面的代码看起来不错,除了您是否需要用户代理以及是否从标头刷新并且您缺少来自 .Open 的 FALSE 参数
  • 这也是我脑子里做的恐怕pastebin.com/JSVDViNu

标签: excel vba web-scraping xmlhttprequest


【解决方案1】:

类似的逻辑应该可以在 vba 中工作:

Dim data As Scripting.Dictionary, key As Variant, block As Collection, r As Long, item As Object

Set data = json("market_data")("financial_statements")("financial_statements")("income")("annual") ' dict of collections

r = 1

For Each key In data.keys
    Set block = data(key)  'each block (section of info) is a row
    r = r + 1: c= 2
    For each item In block 'loop columns in block         
        With Activesheet
            If r = 2 then 'write out headers to row 1,starting col2 and then values to row 2 starting from col 2, and key goes in row , col 1
                .Cells(1,c) = item("date")
            End If
            .Cells(r,1) = Key
            .Cells(r,c) = item("value")
        End With
        c = c + 1
    Next
Next

我无法在 VBA 中进行测试,但如果我编写 python(长手)等效项,我会得到同一张表:

import requests
import pandas as pd

json = requests.get('https://www.reuters.com/companies/api/getFetchCompanyFinancials/AAPL.OQ').json()
data = json["market_data"]["financial_statements"]["income"]["annual"]
rows = len(data.keys()) + 1
columns = len(data["Revenue"]) + 1
r = 0
df = pd.DataFrame(["" for c in range(columns)] for r in range(rows))

for key in data.keys():
    block = data[key]
    r+=1 ; c = 1
    for item in block:
        if r == 1:
            df.iloc[0 , c] = item["date"]
        df.iloc[r,c] = item["value"]
        df.iloc[r,0] = key
        c+=1
print(df)

【讨论】:

  • 非常感谢我的导师。请注意项目“Total Extraordinary Items”,值 -5151 在 28-09-2019 中,而在 JSON 响应中它是 2018-09-29。假定每个值到相关日期。可能有超过 6 个日期。
  • 是吗?在 python 中,它位于正确的位置。你能截屏 vba 的输出吗?
  • 没错。这就是它在网页上的显示方式
  • 你把你的解决方案放在哪里了?此外,工作代码可能是代码审查网站的候选者。至于不匹配那不是我能回答的。也许 json 响应显示参考日期,并且默认为第一列?也许有人搞砸了网站。除了大约 20 年前读过一本关于企业金融的书外,我真的不知道也不了解企业金融。
  • 您应该将其作为答案发布,而不是将其编辑到您的帖子中。另外,删除图像。它使帖子保持整洁,让人们更容易关注。
【解决方案2】:

经过这么多小时,我可以这样调整它

Const strUrl As String = "https://www.reuters.com/companies/api/getFetchCompanyFinancials/"

Sub GetData()
    Dim ws As Worksheet, sSection As String

    For Each ws In ThisWorkbook.Worksheets(Array("IS", "BS", "CF"))
        Select Case ws.Name
            Case "IS": sSection = "income"
            Case "BS": sSection = "balance_sheet"
            Case "CF": sSection = "cash_flow"
        End Select

        GetReuters ws, "tbl" & ws.Name, Sheets("Data").Range("B1").Value, sSection, Sheets("Data").Range("B2").Value
    Next ws
End Sub

Sub GetReuters(ByVal ws As Worksheet, ByVal tblName As String, ByVal sTicker As String, ByVal sSection As String, ByVal sTime As String)
    Dim a, ky, col As Collection, json As Object, data As Object, dic As Object, rng As Range, i As Long, k As Long, c As Long

    With CreateObject("MSXML2.ServerXMLHTTP.6.0")
        .Open "GET", strUrl & sTicker
        .send
        Set json = JSONConverter.ParseJson(.responseText)
    End With

    ReDim b(1 To 10000, 1 To 7)
    c = 1: b(1, c) = "Dates"

    Set data = json("market_data")("financial_statements")(sSection)(sTime)
    Set dic = CreateObject("Scripting.Dictionary")
    dic.CompareMode = 1

    For Each ky In data.Keys
        Set col = data(ky)
        a = CollectionToArray(col)
        k = k + 1
        b(k + 1, 1) = ky

        For i = LBound(a) To UBound(a)
            If Not dic.Exists(CStr(a(i, 1))) Then
                dic(CStr(a(i, 1))) = c
                c = c + 1

                b(1, c) = CStr(a(i, 1))
                b(k + 1, c) = a(i, 2)

            Else
                b(k + 1, dic.item(CStr(a(i, 1))) + 1) = a(i, 2)
            End If
        Next i

        Erase a
    Next ky

    Application.ScreenUpdating = False
        With ws
            On Error Resume Next
                .ListObjects(tblName).Delete
            On Error GoTo 0
            .Range("A1").Resize(k + 1, UBound(b, 2)).Value = b
            With .Range("A1").CurrentRegion
                Set rng = .Offset(1, 1).Resize(.Rows.Count - 1, .Columns.Count - 1)
                rng.NumberFormat = "#,##0.00;(#,##0.00)"
                rng.Rows(1).Offset(-1).NumberFormat = "dd-mmm-yy"
                .Columns.AutoFit
            End With

            .ListObjects.Add(xlSrcRange, .Range("A1").CurrentRegion, , xlYes).Name = tblName
        End With
    Application.ScreenUpdating = True
End Sub

Function CollectionToArray(ByVal c As Collection) As Variant()
    Dim a(), i As Long
    ReDim a(1 To c.Count, 1 To 2)

    For i = 1 To c.Count
        a(i, 1) = c.item(i)("date")
        a(i, 2) = c.item(i)("value")
    Next i

    CollectionToArray = a
End Function

【讨论】:

  • 是的,在我的 PC 中它是完美的,但是当在这里复制和粘贴时,我遇到了这一点。直到现在我不知道如何发布代码,因为它在我的一侧。唯一完美的就是发布为 HTML 代码
  • 我复制了您的 pastebin 代码,并突出显示所有代码,按 Ctrl + K 将代码块缩进代码插入所需的 4 个空格。
  • 很高兴。顺便说一句,这是一篇非常有用的帖子:stackoverflow.com/a/41813615/6241235 用于 ByRef/ByVal,而stackoverflow.com/help/formatting 用于格式化。
  • 你的意思是Sub GetReuters(ByVal ws As Worksheet, ByVal tblName As String, ByVal sTicker As String, ByVal sSection As String, ByVal sTime As String)
  • 我已根据您的指示更新了代码..。我非常感谢。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-05-08
  • 2020-07-13
  • 2022-12-01
  • 2020-02-03
  • 2019-12-10
  • 2021-05-21
  • 1970-01-01
相关资源
最近更新 更多