【问题标题】:how to access the attributes after scraping down a website using beautifulsoup使用beautifulsoup抓取网站后如何访问属性
【发布时间】:2021-03-11 20:58:51
【问题描述】:
import requests,json
from bs4 import BeautifulSoup
from flask import Flask
from flask import request, jsonify
import os
from selenium import webdriver

def checkPriceMyntra(URL):
    headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
    a = requests.Session()
    res = a.get(URL, headers=headers, verify=False)
    soup = BeautifulSoup(res.text,features="html.parser")
    script = None
    #d =  soup.find_all("script")
    for s in soup.find_all("script"):
        print(s)

checkPriceMyntra("https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy")

这里是soup.find_all("script")的一部分:

    <script type="application/ld+json">
                {
                        "@context" : "https://schema.org",
                    "@type" : "Product",
                    "name" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
                    "image" : "https://assets.myntassets.com/h_1440,q_100,w_1080/v1/assets/images/productimage/2020/8/22/be9d5664-5467-475b-b4ea-470a5d64a5481598047122543-1.jpg",
                                "sku" : "12335860",
                                "mpn" : "12335860",
                                "description" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
                                "offers": {
                    "@type": "Offer",
                                        "priceCurrency": "INR",
                                        "availability": "InStock",
                                        "price" : "899",
                                        "url": "https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
                },
                    "brand" : {
                        "@type" : "Thing",
                        "name" : "TAG 7"
                                }


                }
            </script>

我想通过这个脚本访问产品的价格,该怎么做..?? 我试过使用 s.get("price") , s.price, s["price"] 但没有任何效果

【问题讨论】:

    标签: python json web-scraping beautifulsoup


    【解决方案1】:

    price 键 ("price" : "899") 的值在第二个 script 标记下,因此请尝试使用 CSS 选择器 script:nth-of-type(2) 选择第二个 script 标记,并将其转换为 @987654327 @ 使用json 模块。

    import json
    import requests
    from bs4 import BeautifulSoup
    
    
    def checkPriceMyntra(url):
        headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
        }
    
        soup = BeautifulSoup(
            requests.get(url, headers=headers, verify=False).content, "html.parser"
        )
    
        json_data = json.loads(soup.select_one("script:nth-of-type(2)").string)
        print(json_data["offers"]["price"])
    
    
    checkPriceMyntra(
        "https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
    )
    

    输出:

    899
    

    【讨论】:

    • @shivamagarwal 很高兴它成功了!如果这个或任何答案解决了您的问题,请考虑marking it as accepted
    • 已标记但低于 15 声望!!
    猜你喜欢
    • 1970-01-01
    • 2019-12-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-05-30
    • 2020-06-27
    相关资源
    最近更新 更多