如何在 Python 中获取 div 标签中的样式值？答案

【问题标题】：How to get the style value in a div tag in Python?如何在 Python 中获取 div 标签中的样式值？
【发布时间】：2021-11-10 13:41:52
【问题描述】：

我希望在单个网页中抓取图像，并且图像 URL 位于 div 标签中，该标签被验证为样式值，如下所示：

<div class="v-image__image v-image__image--cover" style="background-image: url(&quot;https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png&quot;); background-position: center center;"></div>

我想得到那个：https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png

但是当我尝试使用 chrome 驱动程序查找元素或soup.find 时，它们返回空列表，这是因为 div 标记之间的文本什么都没有。

我正在寻找一种将置于 div 标签内的方法，而不是介于两者之间。

【问题讨论】：

这能回答你的问题吗？ Python: BeautifulSoup - Get an attribute value from the name of a class
@mx0 就是同样的问题，但是什么都没打印出来。

标签： python html beautifulsoup selenium-chromedriver web-crawler

【解决方案1】：

要获得你应该得到的所有images，use presence of all the elements.

一旦您拥有Python 中的列表，例如all_images（见下文），您就可以像下面这样remove the () 和""。

示例代码：

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
links = []
driver.get("https://mashinbank.com/ad/GkbI20tzp3/%D8%AE%D8%B1%DB%8C%D8%AF-%D9%BE%D8%B1%D8%A7%DB%8C%D8%AF-111-SE-1397")
all_images = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'v-image__image--cover')]")))
for image in all_images:
    a = image.get_attribute('style')
    b = a.split("(")[1].split(")")[0].replace('"', '')
    links.append(b)

print(links)

进口：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

输出：

['https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png']

【讨论】：

【解决方案2】：

Selenium 解决此问题的方法如下：您可能应该等待元素可见性，然后才提取元素属性。
拆分整个样式属性以获取 url 值。
像这样：

wait = WebDriverWait(driver, 20)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@style,'https://mashinbank.com/api/parse/files')]")))
style_content = element.get_attribute("style")
url = style_content.split(";")[1]

【讨论】：

谢谢，还有 1 件事：网页中有很多链接，具体取决于图片的数量，这段代码给了我 1 个链接，然后出现错误：in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:
更清楚一点：link 目标是下载此网页中的所有图片。