【问题标题】:Webscraping, appending multiple values to a single row in a listWebscraping,将多个值附加到列表中的单行
【发布时间】:2021-11-23 11:18:35
【问题描述】:

我试图弄清楚如何正确地将多个值附加到列表中。我正在抓取的网页是一个美食博客。我想检索食谱的标题以及与该特定食谱相关的所有食谱键(无麸质、素食、无乳制品、素食等)。我可以从页面中检索信息,但我遇到的问题是将几个食谱键附加到列表上的单行,所以如果页面上的第一个食谱既不含乳制品又不含麸质,我不是能够附加它们,以便它们与相应配方的行匹配。我正在分享我的一段代码,这样你就可以看到我正在使用什么。提前感谢您的帮助。

recipe = []
key = []


for page in pages:
page = requests.get('https://www.skinnytaste.com/page/'+str(page)+'/') 
soup = BeautifulSoup(page.text, 'html.parser')
recipes = soup.find_all('article', class_='post teaser-post odd')
recipes.extend(soup.find_all('article', class_='post teaser-post even'))
sleep(randint(2, 8)) 

for r in recipes:
    
    titles = r.h2.text
    recipe.append(titles)
    print(titles)
    
    
    post_meta = r.find('div', class_='post-meta')                                             
    icons = post_meta.find('div', class_='icons')
    if not (post_meta.find('div', class_='icons') is None):
        keys = icons.find_all('span')
        for k in keys:
            recipe_key = k.find('a').find('img').get('alt')
            key.append(recipe_key) 
            print(recipe_key)

【问题讨论】:

    标签: web-scraping beautifulsoup


    【解决方案1】:

    初始化一个名为rows 的空列表。然后为每个row 创建一个字典,动态更新字典,因为有些菜谱会比其他菜谱有更多的“键”。然后将该字典row 附加到您的rows 列表中。然后 pandas 可以使用它来构建表格。

    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    from time import sleep
    from random import randint
    
    
    headers = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36'}
    
    rows = []
    pages = range(1,5)
    for page in pages:
        response = requests.get('https://www.skinnytaste.com/page/'+str(page)+'/', headers=headers) 
        soup = BeautifulSoup(response.text, 'html.parser')
        recipes = soup.find_all('article', class_='post teaser-post odd')
        recipes.extend(soup.find_all('article', class_='post teaser-post even'))
        sleep(randint(2, 8)) 
        
        for r in recipes:
            
            titles = r.h2.text
    
            print(titles)
            row = {'Title':titles}
            
            
            post_meta = r.find('div', class_='post-meta')                                             
            icons = post_meta.find('div', class_='icons')
            if not (post_meta.find('div', class_='icons') is None):
                keys = icons.find_all('span')
                for count, k in enumerate(keys, start=1):
                    recipe_key = k.find('a').find('img').get('alt')
                    row.update({'key_%.2d' %count: recipe_key})
                    print(recipe_key)
                    
            rows.append(row)
            
    results = pd.DataFrame(rows)
    

    输出:

    print(results.to_string())
                                                                    Title            key_01             key_02            key_03                   key_04               key_05 key_06            key_07
    0   Baked Pumpkin Pasta with Pancetta, Gruyere, Kale, and White Beans       Gluten Free                NaN               NaN                      NaN                  NaN    NaN               NaN
    1                                        Mom’s Stuffing, Lightened Up               NaN                NaN               NaN                      NaN                  NaN    NaN               NaN
    2                         Roasted Green Beans with Caramelized Onions        Dairy Free        Gluten Free  Vegetarian Meals         Whole 30 Recipes                  NaN    NaN               NaN
    3                            7 Day Healthy Meal Plan (November 22-28)               NaN                NaN               NaN                      NaN                  NaN    NaN               NaN
    4                                             Makeover Spinach Gratin       Gluten Free       Kid Friendly          Low Carb         Vegetarian Meals                  NaN    NaN               NaN
    5                            Turkey Pot Pie with Sweet Potato Topping       Gluten Free       Kid Friendly               NaN                      NaN                  NaN    NaN               NaN
    6                     Sautéed Shredded Brussels Sprouts with Pancetta        Dairy Free        Gluten Free      Keto Recipes             Kid Friendly             Low Carb  Paleo  Under 30 Minutes
    7                    Baked Brie Phyllo Cups with Craisins and Walnuts  Under 30 Minutes   Vegetarian Meals               NaN                      NaN                  NaN    NaN               NaN
    8                      Chicken Cassoulet with Sausage and Swiss Chard        Dairy Free      Freezer Meals       Gluten Free                      NaN                  NaN    NaN               NaN
    9                                   Drunken Style Noodles with Shrimp        Dairy Free        Gluten Free               NaN                      NaN                  NaN    NaN               NaN
    10                              Chicken and Broccoli Noodle Casserole      Kid Friendly                NaN               NaN                      NaN                  NaN    NaN               NaN
    11               Arugula Salmon Salad with Capers and Shaved Parmesan       Gluten Free       Keto Recipes          Low Carb         Under 30 Minutes                  NaN    NaN               NaN
    12                              Roasted Acorn Squash with Brown Sugar        Dairy Free        Gluten Free  Vegetarian Meals                      NaN                  NaN    NaN               NaN
    13                                 Turkey Cutlets with Parmesan Crust      Kid Friendly   Under 30 Minutes               NaN                      NaN                  NaN    NaN               NaN
    14                          Butternut Squash Ravioli with Sage Butter  Vegetarian Meals                NaN               NaN                      NaN                  NaN    NaN               NaN
    15                Air Fryer Chicken Milanese with Mediterranean Salad         Air Fryer        Gluten Free  Under 30 Minutes                      NaN                  NaN    NaN               NaN
    16                                Salisbury Steak with Mushroom Gravy        Dairy Free      Freezer Meals      Kid Friendly                 Low Carb     Under 30 Minutes    NaN               NaN
    17                                                   Huevos Rancheros       Gluten Free   Under 30 Minutes  Vegetarian Meals                      NaN                  NaN    NaN               NaN
    18                Easy Black Bean Vegetarian Chili with Spiced Yogurt       Gluten Free       Kid Friendly  Under 30 Minutes         Vegetarian Meals                  NaN    NaN               NaN
    19                                                      Apple Cobbler  Vegetarian Meals                NaN               NaN                      NaN                  NaN    NaN               NaN
    20                Tofu Stir Fry with Vegetables in a Soy Sesame Sauce        Dairy Free        Gluten Free  Under 30 Minutes         Vegetarian Meals                  NaN    NaN               NaN
    21                        Autumn Apple and Grape Medley (Fruit Salad)       Gluten Free       Kid Friendly  Under 30 Minutes         Vegetarian Meals                  NaN    NaN               NaN
    22                                       Chicken Cutlet Caprese Salad       Gluten Free  Meal Prep Recipes               NaN                      NaN                  NaN    NaN               NaN
    23                                             Beef Stew with Pumpkin        Dairy Free      Freezer Meals      Kid Friendly  Pressure Cooker Recipes  Slow Cooker Recipes    NaN               NaN
    24                                       Pumpkin Cream Cheese Muffins     Freezer Meals       Kid Friendly  Vegetarian Meals                      NaN                  NaN    NaN               NaN
    25                                         Pumpkin Pie Overnight Oats        Dairy Free        Gluten Free      Kid Friendly         Vegetarian Meals                  NaN    NaN               NaN
    26                                          Strawberry Cheesecake Dip       Gluten Free       Kid Friendly  Under 30 Minutes                      NaN                  NaN    NaN               NaN
    

    【讨论】:

    • 感谢家人!像魅力一样工作,你是最棒的!
    • 不用担心。将来要小心你的变量。迭代时使用 page 作为变量。但随后您还将page 存储为您的requests 响应。调试时可能会感到困惑,并且根据您创建循环的方式,您可能会无意中开始遇到错误。
    • 注意谢谢!
    猜你喜欢
    • 1970-01-01
    • 2015-01-30
    • 1970-01-01
    • 1970-01-01
    • 2015-03-29
    • 2012-11-29
    • 1970-01-01
    • 2013-12-10
    • 2023-03-23
    相关资源
    最近更新 更多