【问题标题】:getting incomplete output of categories webscraping获取类别网络抓取的不完整输出
【发布时间】:2021-08-16 22:12:54
【问题描述】:

嘿,伙计们,我想用漂亮的汤刮掉这个网站https://www.materiel-velo.com/c110100-roues-velo-route.html 的类别和子类别,但我得到的只是第一个类别和子类别 感谢你们对我的帮助 期待输出

[{'name': 'ROUES VÉLO ROUTE',
  'url': 'https://www.materiel-velo.com/c110100-roues-velo-route.html',
  'sub_categories': [{'name': 'Roue vélo carbone',
  'url': 'https://www.materiel-velo.com/c110101-roue-velo-carbone.html'},
 {'name': 'Roue polyvalente',
'url': 'https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html'}]
}] etc..

这是我的代码

import requests
from bs4 import BeautifulSoup as soup

SITE_URL = "https://www.materiel-velo.com/c110000-route.html"
HEADERS = {
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
    'accept': 'application/json, text/javascript, */*; q=0.01',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.materiel-velo.com/',
    'accept-language': 'en-US,en;q=0.9,fr;q=0.8',
    'cookie': '_ga_5P0CXB07LK=GS1.1.1629112243.1.1.1629115778.0',
}
 response = requests.get(SITE_URL, headers=HEADERS)
 soupe = soup(response.content, 'html5lib')
 categories = []
 lvl1_1 = soupe.find_all(class_="subcategory")
  lvl11 = []
  for item in lvl1_1:
        lvl1 = {
             "name": item.text.strip(),
             "url": item.a['href'],
             "sub_categories": []
            }
        lvl11.append(lvl1)
        lvl2 = item.find(class_="subsubcategories").find_all('a')
        for elt in lvl2:
            lvl2 = {
                "name": elt.text.strip(),
                "url": elt.a['href'] 
                }
        lvl1['sub_categories'].append(lvl2)
    categories.append(lvl1)
    print(categories)

【问题讨论】:

    标签: python web-scraping beautifulsoup python-requests


    【解决方案1】:

    要获取所有类别/子类别+ URL,您可以使用下一个示例:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.materiel-velo.com/c110100-roues-velo-route.html"
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    for a in soup.select(".categories .subsubcategories a"):
        main_category = a.find_previous(class_="category-title").a
    
        print(
            "{:<25} {:<45} {:<80} {:<60}".format(
                main_category.get_text(strip=True),
                a.get_text(strip=True),
                main_category["href"],
                "https://www.materiel-velo.com/" + a["href"],
            )
        )
    

    打印:

    Roue vélo carbone         Roue vélo carbone, Alchemist                  https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/7336/roue-velo-carbone-alchemist
    Roue vélo carbone         Asterion                                      https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2841/roue-velo-carbone-asterion
    Roue vélo carbone         Black Inc                                     https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/5628/roue-velo-carbone-black-inc
    Roue vélo carbone         Bontrager                                     https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/5493/roue-velo-carbone-bontrager
    Roue vélo carbone         Campagnolo                                    https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2842/roue-velo-carbone-campagnolo
    Roue vélo carbone         DT Swiss                                      https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2844/roue-velo-carbone-dt-swiss
    Roue vélo carbone         Roue vélo carbone, FFWD                       https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/7452/roue-velo-carbone-ffwd 
    Roue vélo carbone         Fulcrum                                       https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2847/roue-velo-carbone-fulcrum
    Roue vélo carbone         Lightweight                                   https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/5820/roue-velo-carbone-lightweight
    Roue vélo carbone         Mavic                                         https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2848/roue-velo-carbone-mavic
    Roue vélo carbone         Roue vélo carbone, Most                       https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/7770/roue-velo-carbone-most 
    Roue vélo carbone         Nix                                           https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/5546/roue-velo-carbone-nix  
    Roue vélo carbone         Progress Cycles                               https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/6799/roue-velo-carbone-progress-cycles
    Roue vélo carbone         Shimano                                       https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2849/roue-velo-carbone-shimano
    Roue vélo carbone         Vision                                        https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2851/roue-velo-carbone-vision
    Roue vélo carbone         Roue vélo carbone, Wilier Triestina           https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/7912/roue-velo-carbone-wilier-triestina
    Roue vélo carbone         Zipp                                          https://www.materiel-velo.com/c110101-roue-velo-carbone.html                     https://www.materiel-velo.com/s/2852/roue-velo-carbone-zipp 
    Roue polyvalente          Bontrager                                     https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/5887/roue-polyvalente-bontrager
    Roue polyvalente          Campagnolo                                    https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2824/roue-polyvalente-campagnolo
    Roue polyvalente          DT Swiss                                      https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2825/roue-polyvalente-dt-swiss
    Roue polyvalente          Roue polyvalente, FFWD                        https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/7453/roue-polyvalente-ffwd  
    Roue polyvalente          Fulcrum                                       https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2826/roue-polyvalente-fulcrum
    Roue polyvalente          Mavic                                         https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2827/roue-polyvalente-mavic 
    Roue polyvalente          Progress Cycles                               https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/6712/roue-polyvalente-progress-cycles
    Roue polyvalente          Shimano                                       https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2828/roue-polyvalente-shimano
    Roue polyvalente          Vision                                        https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2829/roue-polyvalente-vision
    Roue polyvalente          Roue polyvalente, Zipp                        https://www.materiel-velo.com/c110102-roue-velo-polyvalente.html                 https://www.materiel-velo.com/s/2830/roue-polyvalente-zipp  
    Blocage rapide roue       BBB                                           https://www.materiel-velo.com/c110108-blocage-rapide-roue-velo-route.html        https://www.materiel-velo.com/s/2280/blocage-rapide-roue-bbb
    Blocage rapide roue       Blocage rapide roue, Blackbearing             https://www.materiel-velo.com/c110108-blocage-rapide-roue-velo-route.html        https://www.materiel-velo.com/s/7918/blocage-rapide-roue-blackbearing
    Blocage rapide roue       BMC                                           https://www.materiel-velo.com/c110108-blocage-rapide-roue-velo-route.html        https://www.materiel-velo.com/s/6318/blocage-rapide-roue-bmc
    Blocage rapide roue       Campagnolo                                    https://www.materiel-velo.com/c110108-blocage-rapide-roue-velo-route.html        https://www.materiel-velo.com/s/2281/blocage-rapide-roue-campagnolo
    
    
    ...and so on.
    

    编辑:将输出作为字典:

    out = {}
    for a in soup.select(".categories .subsubcategories a"):
        main_category = a.find_previous(class_="category-title").a
        out.setdefault(main_category, []).append(a)
    
    out = [
        {
            "name": k.get_text(strip=True),
            "url": k["href"],
            "sub_categories": [
                {
                    "name": vv.get_text(strip=True),
                    "url": "https://www.materiel-velo.com/" + vv["href"],
                }
                for vv in v
            ],
        }
        for k, v in out.items()
    ]
    
    print(out)
    

    打印:

    [
        {
            "name": "Roue vélo carbone",
            "url": "https://www.materiel-velo.com/c110101-roue-velo-carbone.html",
            "sub_categories": [
                {
                    "name": "Roue vélo carbone, Alchemist",
                    "url": "https://www.materiel-velo.com/s/7336/roue-velo-carbone-alchemist",
                },
                {
                    "name": "Asterion",
                    "url": "https://www.materiel-velo.com/s/2841/roue-velo-carbone-asterion",
                },
                {
                    "name": "Black Inc",
                    "url": "https://www.materiel-velo.com/s/5628/roue-velo-carbone-black-inc",
                },
    
    ...
    

    【讨论】:

    • 感谢您的回复,但我想要的输出类似于 'sub_categories': [{'name': 'Freinage Etriers de frein Freins à disque vélo route Disque vélo route Patins de frein vélo Porte-patins Plaquettes de freins route Kit de purge route/cx/gravel', 'url': 'materiel-velo.com/c400213-freinage.html', 'sub_categories': [{'name': 'Kit de purge route/cx/gravel'}]} ]},有可能吗??
    • @taniiit 您能否编辑您的问题并将预期的输出放在那里(格式正确)?
    猜你喜欢
    • 1970-01-01
    • 2017-11-14
    • 2022-01-25
    • 1970-01-01
    • 2011-05-18
    • 1970-01-01
    • 1970-01-01
    • 2022-11-28
    相关资源
    最近更新 更多