【问题标题】:Getting text from a particular div class on a webpage HTML using VBA in excel在 excel 中使用 VBA 从网页 HTML 上的特定 div 类中获取文本
【发布时间】:2021-11-05 06:33:17
【问题描述】:

编辑:谢谢你们的解决方案,伙计们。

下面代码的问题是元素无法获取 div 类下的文本 "col-xs-12 col-sm-12 col-md-6 col-lg-5 col -md-pull-6 col-lg-pull-7 p-main-title-wrapper" 来自网页“https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461”和将其打印在 excel 的工作表上。唯一需要提取的文本是“JAKE BAYLISS”,仅此而已。

Sub Horse6()

Dim ws As Worksheet
Dim r As Integer
Dim c As Integer
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim node As HTMLHtmlElement
Dim nodeDiv As HTMLHtmlElement

  Set ws = ThisWorkbook.Worksheets("Sheet1")
  r = 2
  c = 12
  
    With http
    .Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
    .send
    html.body.innerHTML = .responseText
    End With
  
'Problems with the elements are here

    With html.getElementsByClassName("np mainparent")
        For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5  col-md-pull-6 col-lg-pull-7  p-main-title-wrapper")
            For Each nodeDiv In node.getElementsByTagName("div")
              ws.Cells(r, c) = .Item(0).innerText
            Next
        Next
    End With
  
    MsgBox "Data input complete"

End Sub

【问题讨论】:

    标签: html excel vba


    【解决方案1】:

    您是否尝试在 For Each 选择器中仅选择“p-main-title-wrapper”类? & 另外,不要在第二个 For Each 中选择“div”,而是尝试选择“h1”元素标签,因为它将与上述元素分开声明。

    【讨论】:

      【解决方案2】:

      你说的代码部分是这样的

      <div class="col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper">
          <h1 style="display:inline !important">JAKE BAYLISS</h1>
      </div>
      

      因此,您需要在其中添加 &lt;h1&gt; 标签,而不是您尝试的 &lt;div&gt;。见下文:

      Dim nodeH1 As HTMLHtmlElement
      With html.getElementsByClassName("np mainparent")
          For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5  col-md-pull-6 col-lg-pull-7  p-main-title-wrapper")
              For Each nodeH1 In node.getElementsByTagName("h1")
                ws.Cells(r, c) = nodeH1.innerText
              Next
          Next
      End With
      

      【讨论】:

        【解决方案3】:

        还有getElementsByTagName()的方法。所需的文本位于h1 标记中,即在第一个标记中。您需要直接访问它的唯一行是
        html.getElementsByTagName("h1")(0).innertext

        我省略了直接阅读不需要的所有内容。您可以确切地看到整个循环构造是不必要的。

        Sub Horse6()
        
        Dim http As New XMLHTTP60
        Dim html As New HTMLDocument
          
          With http
            .Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
            .send
            html.body.innerHTML = .responseText
          End With
          
          MsgBox html.getElementsByTagName("h1")(0).innertext
        End Sub
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-07-24
          • 1970-01-01
          相关资源
          最近更新 更多