使用 Selenium 在网页上抓取隐藏的产品详细信息答案

【问题标题】：Scraping hidden product details on a webpage using Selenium使用 Selenium 在网页上抓取隐藏的产品详细信息
【发布时间】：2017-03-26 09:25:56
【问题描述】：

抱歉，我是 Selenium 菜鸟，阅读了大量内容，但仍然无法从此页面获取产品价格（0.55 英镑）： https://groceries.asda.com/product/spaghetti-tagliatelle/asda-spaghetti/36628。使用 bs4 解析 html 时，产品详细信息不可见。使用 Selenium，我可以获得整个页面的字符串，并且可以看到其中的价格（使用以下代码）。我应该能够以某种方式从中提取价格，但更喜欢一个不那么老套的解决方案。

browser = webdriver.Firefox(executable_path=r'C:\Users\Paul\geckodriver.exe')
browser.get('https://groceries.asda.com/product/tinned-tomatoes/asda-smart-price-chopped-tomatoes-in-tomato-juice/19560')
content = browser.page_source

如果我运行这样的事情：

elem = driver.find_element_by_id("bodyContainerTemplate")
print(elem)

它只是返回：selenium.webdriver.firefox.webelement.FirefoxWebElement (session="df23fae6-e99c-403c-a992-a1adf1cb8010", element="6d9aac0b-2e98-4bb5-b8af-fcbe443af906")

价格是与此元素关联的文本：p class="prod-price" 但我似乎无法使其正常工作。我应该如何获取此文本（产品价格）？

【问题讨论】：

页面中某些信息丢失的原因是网页的一部分是用 Javascript 呈现的。查看this 了解更少的“hacky”解决方案
您可以从 API 获取产品数据。 https://groceries.asda.com/api/items/view?itemid=36628&responsegroup=extended&cacheable=true&storeid=4565&shipdate=currentDate&requestorigin=gi，无需使用 Selenium。 python-requests一路！
改用print(elem.text)
感谢所有建议。 API 确实让事情变得更简单！

标签： python selenium web-scraping

【解决方案1】：

试试这个解决方案，它适用于 selenium 和 beautifulsoup

from bs4 import BeautifulSoup
from selenium import webdriver

url='https://groceries.asda.com/product/spaghetti-tagliatelle/asda-spaghetti/36628'

driver = webdriver.PhantomJS()
driver.get(url)

data = driver.page_source

soup = BeautifulSoup(data, 'html.parser')

ele = soup.find('span',{'class':'prod-price-inner'})

print ele.text

driver.quit()

它将打印：

£0.55

【讨论】：

【解决方案2】：

elem 的类型是WebElement。如果您需要提取 web 元素的文本值，您可以使用以下代码：

elem = driver.find_element_by_class_name("prod-price-inner")
print(elem.text)

【讨论】：