【问题标题】:Excel VBA code - Amend to pick up other tables in different page?Excel VBA 代码 - 修改以在不同页面中选择其他表格?
【发布时间】:2018-09-19 10:39:22
【问题描述】:

关于修改一段代码以获取其他网页表格内容的快速问题。关于如何根据标题获取某些表格内容,我在这里获得了一些很好的指导,并且效果很好 - 再次感谢 'QHarr',他非常乐于助人。

我想要获取表格详细信息的 URL 是:

https://finance.yahoo.com/quote/AAPL/?p=AAPL

这是获取我想要的数据的一段代码:

Sub GetYahooInfo100()
Dim tickers(), ticker As Long, lastRow As Long, headers()
Dim wsSource As Worksheet, http As clsHTTP, html As HTMLDocument

Application.ScreenUpdating = False

Set wsSource = ThisWorkbook.Worksheets("100")
Set http = New clsHTTP

headers = Array("Ticker", "Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", "Beta", "PE Ratio (TTM)", "EPS (TTM)", _
                "Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est")

With wsSource
    lastRow = GetLastRow(wsSource, 1)
    Select Case lastRow
    Case Is < 3
        Exit Sub
    Case 3
        ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
    Case Is > 3
        tickers = .Range("A3:A" & lastRow).Value
    End Select

    ReDim Results(0 To UBound(tickers, 1) - 1)
    Dim i As Long, endPoint As Long
    endPoint = UBound(headers)

    For ticker = LBound(tickers, 1) To UBound(tickers, 1)
        If Not IsEmpty(tickers(ticker, 1)) Then
            Set html = http.GetHTMLDoc("https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/?p=" & tickers(ticker, 1))
            Results(ticker - 1) = http.GetInfo(html, endPoint)
            On Error Resume Next
            Set html = Nothing
        Else
            Results(ticker) = vbNullString
        End If
    Next

    .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
    For i = LBound(Results) To UBound(Results)
        .Cells(3 + i, 2).Resize(1, endPoint - 1) = Results(i)
    Next
End With
Application.ScreenUpdating = True
End Sub

下面还有一段:

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
    GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With

结束函数

就像我说的,这里的一位成员非常有助于确定如何做到这一点。我已经尝试修改这段代码以从统计页面中获取另一组数据,如下:

https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL

但我一定错过了什么。除非我对表格的引用不正确,否则我会不知所措。我正在查看是否可以捕获每个表中的所有数据字段,而不仅仅是我可能出错的一个。

希望有人能提供帮助。

非常感谢。

【问题讨论】:

标签: html vba excel web-scraping


【解决方案1】:

尝试以下方法:

Option Explicit

Public Sub GetYahooInfo()
    Dim tickers(), ticker As Long, lastRow As Long, headers()
    Dim wsSource As Worksheet, http As Object, html As New HTMLDocument
    headers = Array("Previous Close", "Open", "Bid", "Ask", "Day's Range", "52 Week Range", "Volume", "Avg. Volume", "Market Cap", _
                    "Beta", "PE Ratio (TTM)", "EPS (TTM)", "Earnings Date", "Forward Dividend & Yield", "Ex-Dividend Date", "1y Target Est", _
                    "Market Cap (intraday)", "Enterprise Value", "Trailing P/E", "Forward P/E", "PEG Ratio (5 yr expected)", "Price/Sales (ttm)", _
                    "Price/Book (mrq)", "Enterprise Value/Revenue", "Enterprise Value/EBITDA", "Fiscal Year Ends", "Most Recent Quarter (mrq)", _
                    "Profit Margin", "Operating Margin (ttm)", "Return on Assets (ttm)", "Return on Equity (ttm)", "Revenue (ttm)", "Revenue Per Share (ttm)", _
                    "Quarterly Revenue Growth (yoy)", "Gross Profit (ttm)", "EBITDA", "Net Income Avi to Common (ttm)", "Diluted EPS (ttm)", _
                    "Quarterly Earnings Growth (yoy)", "Total Cash (mrq)", "Total Cash Per Share (mrq)", "Total Debt (mrq)", _
                    "Total Debt/Equity (mrq)", "Current Ratio (mrq)", "Book Value Per Share (mrq)", "Operating Cash Flow (ttm)", _
                    "Levered Free Cash Flow (ttm)", "Beta", "52-Week Change", "S&P500 52-Week Change", "52 Week High", "52 Week Low", _
                    "50-Day Moving Average", "200-Day Moving Average", "Avg Vol (3 month)", "Avg Vol (10 day)", _
                    "Shares Outstanding", "Float", "% Held by Insiders", "% Held by Institutions", "Shares Short (Aug 31, 2018)", _
                    "Short Ratio (Aug 31, 2018)", "Short % of Float (Aug 31, 2018)", "Short % of Shares Outstanding (Aug 31, 2018)", _
                    "Shares Short (prior month Jul 31, 2018)", "Forward Annual Dividend Rate", "Forward Annual Dividend Yield", _
                    "Trailing Annual Dividend Rate", "Trailing Annual Dividend Yield", "5 Year Average Dividend Yield", "Payout Ratio", _
                    "Dividend Date", "Ex-Dividend Date", "Last Split Factor (new per old)", "Last Split Date")
    Application.ScreenUpdating = False

    Set wsSource = ThisWorkbook.Worksheets("Sheet1") '<== Change as appropriate to sheet containing the tickers

    With wsSource
        lastRow = GetLastRow(wsSource, 1)
        Select Case lastRow
        Case Is < 3
            Exit Sub
        Case 3
            ReDim tickers(1, 1): tickers(1, 1) = .Range("A3").Value
        Case Is > 3
            tickers = .Range("A3:A" & lastRow).Value
        End Select

        Dim i As Long, sResponse As String
        Set http = CreateObject("MSXML2.XMLHTTP")

        For ticker = LBound(tickers, 1) To UBound(tickers, 1)
            With Worksheets("Sheet1")
                If Not IsEmpty(tickers(ticker, 1)) Then
                    With http
                        .Open "GET", "https://finance.yahoo.com/quote/" & tickers(ticker, 1) & "/key-statistics?p=" & tickers(ticker, 1), False
                        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
                        .send
                        sResponse = StrConv(.responseBody, vbUnicode)
                    End With

                    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))   
                    html.body.innerHTML = sResponse

                    Dim tables As Object, destLastRow As Long
                    Dim counter As Long: counter = 2

                    Set tables = html.querySelectorAll("tbody td")             
                    .Cells(2, 2).Resize(1, UBound(headers) + 1) = headers

                    For i = 1 To tables.Length - 1 Step 2
                        .Cells(ticker + 2, counter) = tables(i).innerText
                        If InStr(tables(i).innerText, "Last Split Date") > 0 Or InStr(tables(i + 1).innerText, "Last Split Date") > 0 Then
                            Exit For
                        End If
                        counter = counter + 1
                    Next
                    Set html = Nothing: Set tables = Nothing
                Else
                    .Cells(2 + ticker, 2) = "N/A"
                End If
            End With
        Next
    End With
    Application.ScreenUpdating = True
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

示例结果:

【讨论】:

  • 嘿伙计——太好了!它在 B3 下方的列中创建一个列表,与原始列表不同,它从单元格 A3 填充第 3 行,然后从 A4 等获取下一个值。这是你的代码的意图 - 或者这就是我所要求的!我正在寻找与原始代码类似的代码列表的值,其中值填充相应的行。如果这对于所有这些信息来说太多了,我可以为每个信息创建一个不同的工作表并相应地填充它吗?以前的方法是否完全可以选择?谢谢。
  • “更整洁一点”看起来很棒。将所有数据放在另一张纸上。我已经用股票代码填充了单元格 A3-A9,并且工作表 2 中包含所有 6 个数据。每行 75 行,总共 450 行。只需要找出一种方法来隔离每个代码的数据,或者为每个代码创建一个新工作表,并将代码作为工作表名称。在雅虎取消以前收集信息的方法之前,我一直使用它。
  • 我尝试替换代码并针对 A 列中的 6 个代码进行尝试。它将标题放在第 2 行,但未能收集代码的每个代码行的数据。也许我做错了什么 - 我是否只是简单地复制代码并瞧?谢谢。
  • 我目前用作测试的六个是:AAPL AAT AAV AB ABB ABBV
  • 这真的很了不起。感激不尽。祝你有美好的一天?
猜你喜欢
  • 2015-08-13
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-05-25
  • 2021-07-13
  • 1970-01-01
相关资源
最近更新 更多