【问题标题】:Web scraping content of ::before ::after using python returns [ ]网页抓取 ::before ::after 的内容使用 python 返回 [ ]
【发布时间】:2021-07-28 07:18:13
【问题描述】:

我对 python 非常陌生,甚至是整体编码。我已经使用此代码成功抓取了大约 10 个网站,但很遗憾它不适用于该网站。我想用每个产品的类提取所有 div。但是 div 没有出现在 page_soup 中。我读到 ::before 和 ::after 是一个问题,但我找不到有效的解决方案。我的代码中可能有多个问题,但我找不到它们。我有两个星期的不眠之夜。请帮忙

from urllib.request import  urlopen as uReq
from bs4 import BeautifulSoup as soup

url = "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0"

uClient=uReq(url)
page_html=uClient.read()
uClient.close()

page_soup=soup(page_html,"html.parser")
containers=page_soup.findAll("div",{"class":"product-list__item"})
print(containers)
        
quit()

我得到的结果 []

如果我必须提供更多信息,请告诉我

【问题讨论】:

  • 您的问题与::before / ::after 无关 - 您没有得到结果,因为页面正在动态加载内容,并且您的 div 在获取时不在页面上。
  • 我不知道该怎么做,我尝试使用 time() 等待页面加载,知道如何获取 div

标签: python css web-scraping beautifulsoup pseudo-element


【解决方案1】:

通过发送GET 请求并将正确的headers 添加到:

https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0

您可以只使用requests 模块获取数据,无需使用BeautifulSoup

import requests


headers = {
    "Referer": "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
}

URL = "https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0"
response = requests.get(URL, headers=headers).json()
fmt_string = "{:<70} {:<15} {}"

print(fmt_string.format("Brand", "Price", "Image"))
print("-" * 200)

for d in response["contents"][0]["mainContent"][0]["contents"]:
    for dd in d["records"]:
        print(
            fmt_string.format(
                dd["attributes"]["p_displayName"],
                dd["startingPrice"]["p_pl30"],
                "https://images.woolworthsstatic.co.za/"
                + dd["attributes"]["p_externalImageReference"],
            )
        )

输出(截断):

Brand                                                                  Price           Image
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Soda Water Sparkling Drink 200 ml                                      9.49            https://images.woolworthsstatic.co.za/Soda-Water-Sparkling-Drink-200-ml-6001009034250.jpg?V=50$J&o=eucyUmAbqcqMSs0IjPaS4WA$mzoj&
Salt & Vinegar Flavoured Potato Crisps 36 g                            9.49            https://images.woolworthsstatic.co.za/Salt-Vinegar-Flavoured-Potato-Crisps-36-g-6009175413541.jpg?V=6Pfl&o=Tyz@wbWHKvnW@Kc69RTJYM7WBUQj&
Salted Farmer's Crisps 36 g                                            9.49            https://images.woolworthsstatic.co.za/Salted-Farmer-s-Crisps-36-g-6009217630752.jpg?V=TM2n&o=eR0n3eqV0@15TKcRSRz1RzzVPW8j&
Lemonade Sugar Free Sparkling Flavoured Drink 200 ml                   9.49            https://images.woolworthsstatic.co.za/Lemonade-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014238.jpg?V=9Skb&o=hqcUN6THi9J8YRixCQEica2ftcMj&
Ginger Ale Sugar Free Sparkling Flavoured Drink 200 ml                 9.49            https://images.woolworthsstatic.co.za/Ginger-Ale-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014245.jpg?V=x9Pr&o=C7oKpoX27D3z2vf11X7bKRmUJEsj&
Cheddar Flavoured Crisps 36 g                                          9.49            https://images.woolworthsstatic.co.za/Cheddar-Flavoured-Crisps-36-g-6009217630776.jpg?V=zCPn&o=Ufn3jhhUzUGckf72QHLRaa64g20j&
...

【讨论】:

  • 谢谢家伙。太好了,谢谢。我也设法获得了品牌,但是我如何获得价格和形象
  • 非常感谢,这很好用。你们是最棒的
  • @Vally 很高兴为您提供帮助。如果此答案或任何答案解决了您的问题,请考虑将其标记为 accepted
猜你喜欢
  • 1970-01-01
  • 2019-11-05
  • 2020-04-24
  • 2020-02-17
  • 2020-08-27
  • 2013-07-10
  • 2023-01-29
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多