【问题标题】:Can't write data in an excel file in a customized way无法以自定义方式将数据写入 excel 文件
【发布时间】:2025-12-30 22:20:08
【问题描述】:

我正在尝试创建一个宏来从网页中获取一些内容并以自定义方式将其写入 excel 文件中。我使用了来自同一个网站的两个相同的链接。这是其中的one。我对NameRecipeIngredients 三个字段感兴趣。

我创建的脚本可以相应地解析数据。但是,我想将它们排列在 this 之类的 excel 文件中。

到目前为止我已经写了(工作完美):

Sub GetAndArrangeData()
    Dim HTML As New HTMLDocument, oPost As Object
    Dim HTMLDoc As New HTMLDocument, ws As Worksheet
    Dim oTitle As Object, oPosts As Object
    Dim linklist As Variant, url As Variant
    
    linklist = Array( _
        "https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
        "https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
    )
    
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    
    For Each url In linklist
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", url, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
            .send
            HTML.body.innerHTML = .responseText
        End With
        
        Set oTitle = HTML.querySelector("h1.entry-title")
        Debug.Print oTitle.innerText

        Set oPost = HTML.querySelectorAll(".cma-recipe-nutrition > .wprm-nutrition-label-container > span[class*='nutrition-container']")
        For I = 0 To oPost.Length - 1
            HTMLDoc.body.innerHTML = oPost(I).outerHTML
            Debug.Print HTMLDoc.querySelector("span.wprm-nutrition-label-text-nutrition-label").innerText
            Debug.Print HTMLDoc.querySelector("span[class*='nutrition-value']").innerText
        Next I

        Set oPosts = HTML.querySelectorAll(".wprm-recipe-block-container")
        For I = 0 To oPosts.Length - 1
            HTMLDoc.body.innerHTML = oPosts(I).outerHTML
            On Error Resume Next
            Debug.Print HTMLDoc.querySelector("span.wprm-recipe-details-label").innerText
            Debug.Print HTMLDoc.querySelector("span.wprm-recipe-details").innerText
            On Error GoTo 0
        Next I
    Next url
End Sub

如何按照上图所示的方式将数据写入 excel 文件?

顺便说一句,这是我在即时窗口中得到的结果:

Avocado Chicken Salad
Calories: 
542
Carbohydrates: 
30
Protein: 
11
Fat: 
45
Saturated Fat: 
7
Cholesterol: 
16
Sodium: 
285
Potassium: 
687
Fiber: 
8
Sugar: 
9
Vitamin A: 
945
Vitamin C: 
19
Calcium: 
36
Iron: 
1
Course 
Cuisine 
Keyword 
Prep Time 
20
Cook Time 
15
Total Time 
35
Servings 
2
Calories 
542
Cost 
$6.82
Caprese Quinoa Salad
Calories: 
375
Carbohydrates: 
30
Protein: 
11
Fat: 
26
Saturated Fat: 
4
Cholesterol: 
7
Sodium: 
73
Potassium: 
996
Fiber: 
9
Sugar: 
7
Vitamin A: 
17616
Vitamin C: 
32
Calcium: 
168
Iron: 
4
Course 
Cuisine 
Keyword 
Prep Time 
35
Cook Time 
25
Chilling Time (Quinoa) 
1
Total Time 
2
Servings 
6
Calories 
375
Cost 
$6.84

【问题讨论】:

  • “完美工作”是什么意思?我尝试对其进行测试,两行都试图定义oPosts对象,返回'Null. I mean Set oPosts = ...`。需要注册吗?
  • 查看我在执行宏@FaneDuru 时在即时窗口中得到的输出。不,不需要任何东西来获取结果。只需按原样运行即可。谢谢。
  • 我照原样复制它,只尝试运行。它返回一个正确的.responseText,正确的oTitle.innerText,但它总是以Null 的形式返回Set oPost = HTML.querySelectorAll(".... ...没有oPost.Length,也没有任何迭代。如果我能获得您显示的“结果”(在即时窗口中),我可以向您展示如何根据需要将其安排在表格中。
  • 您真的想要关键字、课程和美食吗?
  • 不。我真的想把他们踢出去@QHarr。

标签: excel vba web-scraping


【解决方案1】:

基本上,您只需要跟踪写入数据的位置即可。我定义了一个变量row,它设置为您要放入数据的第一行。在每个配方之后,写入的行数被添加到row。为了跟踪行数,我使用了两个单独的变量oPostNut 和 oPostsRecipe(instead of only oneoPosts`) 并添加较大列表的条目数 - 基本上就是这样。

(...)
Dim row As Long
row = 1         ' Change to whatever row you want to start
For Each url In linklist
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
        .send
        HTML.body.innerHTML = .responseText
    End With
    
    Set oTitle = HTML.querySelector("h1.entry-title")
    ws.Cells(row, 1) = oTitle.innerText

    Dim i As long
    Dim oPostsNut As Object
    Set oPostsNut = HTML.querySelectorAll(".cma-recipe-nutrition > .wprm-nutrition-label-container > span[class*='nutrition-container']")        
    For i = 0 To oPostsNut.Length - 1
        HTMLDoc.body.innerHTML = oPostsNut(i).outerHTML
        ws.Cells(row + i, 2) = HTMLDoc.querySelector("span.wprm-nutrition-label-text-nutrition-label").innerText
        ws.Cells(row + i, 3) = HTMLDoc.querySelector("span[class*='nutrition-value']").innerText
    Next i

    Dim oPostsRecipe As Object
    Set oPostsRecipe = HTML.querySelectorAll(".wprm-recipe-block-container")
    For i = 0 To oPostsRecipe.Length - 1
        HTMLDoc.body.innerHTML = oPostsRecipe(i).outerHTML
        On Error Resume Next
        ws.Cells(row + i, 4) = HTMLDoc.querySelector("span.wprm-recipe-details-label").innerText
        ws.Cells(row + i, 5) = HTMLDoc.querySelector("span.wprm-recipe-details").innerText
        On Error GoTo 0
    Next i
    
    row = row + IIf(oPostsNut.Length > oPostsRecipe.Length, oPostsNut.Length, oPostsRecipe.Length)
Next url

【讨论】:

    【解决方案2】:

    我认为我们可以做得更好。如果我们使用更具选择性的 css 选择器,我们可以摆脱我在其他答案 (12/02/21) 和您最初的尝试中看到的附加信息。使用下面的选择器,我删除了该附加信息并仅返回所需的信息。我使用数组比一直写入工作表要快。我不再需要重新创建 xmlhttp 对象和额外的 HTMLDocument。

    Option Explicit
    
    Public Sub GetAndArrangeData()
        Dim html As New MSHTML.HTMLDocument, xhr As Object, ws As Worksheet
        Dim linklist As Variant, url As Variant, totalRows
        
        linklist = Array( _
            "https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
            "https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
        )
        
        Set ws = ThisWorkbook.Worksheets("Sheet1")
        Set xhr = CreateObject("MSXML2.XMLHTTP")
        
        totalRows = 1
        
        For Each url In linklist
           
           With xhr
                .Open "GET", url, False
                .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
                .send
                html.body.innerHTML = .responseText
            End With
            
            Dim title As String
            
            title = html.querySelector("h1.entry-title").innerText
        
            Dim nutritionRows As Object, timesOtherRows As Object, maxRows As Long
            
            Set nutritionRows = html.querySelectorAll(".wprm-nutrition-label-container .wprm-nutrition-label-text-nutrition-container")
            Set timesOtherRows = html.querySelectorAll(".cma-recipe-mobile .wprm-recipe-times-container .wprm-recipe-block-container-columns, .wprm-recipe-meta-container ~ .wprm-recipe-block-container-columns")
    
            maxRows = IIf(nutritionRows.Length > timesOtherRows.Length, nutritionRows.Length, timesOtherRows.Length) - 1
    
            Dim recipeInfo(), i As Long
            ReDim recipeInfo(1 To maxRows, 1 To 5)
            
            On Error Resume Next
            For i = 0 To maxRows
                recipeInfo(i + 1, 1) = IIf(i = 0, title, vbNullString)
                recipeInfo(i + 1, 2) = nutritionRows.Item(i).Children(0).innerText
                recipeInfo(i + 1, 3) = nutritionRows.Item(i).Children(1).innerText
                recipeInfo(i + 1, 4) = timesOtherRows.Item(i).Children(1).innerText
                recipeInfo(i + 1, 5) = timesOtherRows.Item(i).Children(2).innerText
            Next
            On Error GoTo 0
           
            ws.Cells(totalRows, 1).Resize(UBound(recipeInfo, 1), UBound(recipeInfo, 2)) = recipeInfo
            totalRows = totalRows + maxRows
        Next url
    End Sub
    

    JSON:

    也许更容易的是从响应的 HEAD 部分中的 script 标记中获取所有信息作为 json。您需要将响应包装在 body 标记中,以防止 HTML 解析器在您将其添加到 MSHTML.HTMLDocument 对象的 body.innerHTML 时剥离此内容。

    我不打算展示 json 解析,因为有很多示例,但会展示提取它。

    Option Explicit
    
    Public Sub GetAndArrangeData()
        Dim html As New MSHTML.HTMLDocument, xhr As Object
        Dim linklist As Variant, url As Variant
        
        linklist = Array( _
            "https://www.chelseasmessyapron.com/avocado-chicken-salad-2/", _
            "https://www.chelseasmessyapron.com/caprese-quinoa-salad/" _
        )
        
        Set xhr = CreateObject("MSXML2.XMLHTTP")
    
        For Each url In linklist
           
           With xhr
                .Open "GET", url, False
                .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"
                .send
                html.body.innerHTML = "<body>" & .responseText & "</body>"
            End With
            
            Debug.Print html.querySelector(".yoast-schema-graph").innerHTML
            
        Next url
    End Sub
    

    【讨论】: