使用 BeautifulSoup 从 html 中抓取特定数据答案

【问题标题】：scraping specific data from html using BeautifulSoup使用 BeautifulSoup 从 html 中抓取特定数据
【发布时间】：2017-12-30 03:44:30
【问题描述】：

我试图在以下链接中获得少数产品的最佳搜索结果位置

https://www.purplle.com/search?q=hair%20fall%20shamboo

我使用以下工具从页面获取 html 详细信息 ++

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()

现在我很困惑如何从此 html 中获取产品名称和位置（以在搜索中获得最佳排名）。

我使用下面的方法来获取产品的详细信息，但输出也有很多不需要的东西。

details = soup.find('div', attrs={'class': 'pr'})

知道如何解决这个问题吗？

【问题讨论】：

标签： html python-3.x selenium web-scraping beautifulsoup

【解决方案1】：

我不知道你所说的职位是什么意思。但是，以下脚本可以从该页面获取不同产品的标题及其位置（据称）：

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")

soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.find_all(class_="prd-lstng pr"):
    name = item.find_all(class_="pro-name el2")[0].text
    position = item.find_all(class_="mrl5 tx-std30")[0].text
    print(name,position)

driver.quit()

【讨论】：

购物者通常会从 purplle.com 搜索产品，说他正在 purplle.com 中搜索“头发掉落洗发水”。搜索结果包含来自不同品牌的不同产品，例如 Tresemme、Loreal 和 Dove。我正在努力为这些品牌获得最佳排名。例如在purplle.com/search?q=hair%20fall%20shamboo good vibes 排名第一，Michel Mercier 排名第二，Livon Serum 排名第三
当我查看结果和该网页中的结果时，我没有看到任何变化。但是，如果使网页向下滚动直到到达底部，则网页会显示其全部内容。也许您正在谈论的产品名称位于最后一部分或页面中的某个位置，如果我不滚动，从我的末端看不到。您的要求是学习循环以解析该页面中的产品而不是处理延迟加载。
实际搜索的关键词是“hair fall shampoo”，url变成了
url 变为 "purplle.com/search?q=hair%20fall%20shampoo" 似乎驱动程序没有从整个页面下载数据，是否可以在“加载更多选项”中获取前 100 个元素？此外，当我检查元素时，我注意到“item_position”参数正在根据搜索显示产品项目的位置。我们也可以提取它吗？
关注此LINK 以满足您的额外要求。线程中有一个完全相同的网页，但问题有些相似。