【发布时间】:2016-04-05 20:24:40
【问题描述】:
输入 HTML:
<div style="display: flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>
期望的输出:所有div元素正好在<div style="display: flex">之下。
我正在尝试使用 CSS selector 定位父 div:
div[style="display: flex"]
这会引发错误:
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
看起来BeautifulSoup 试图将冒号解释为伪类语法。
我已尝试遵循 Handling a colon in an element ID in a CSS selector 建议的建议,但仍然会引发错误:
>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="display\3A flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"
问题:
在BeautifulSoup CSS 选择器的属性值中使用/转义冒号的正确方法是什么?
请注意,我可以通过部分属性匹配来解决它:
soup.select("div[style$=flex]")
或者,使用find_all():
soup.find_all("div", style="display: flex")
另外请注意,我了解使用style 定位元素远非一种好的定位技术,但问题本身是通用的,提供的 HTML 只是一个示例。
【问题讨论】:
-
我假设您也尝试过两个反斜杠?
soup.select('div[style="display\\: flex"]') -
@JoshCrozier 新年快乐,是的,以及原始字符串和常规字符串的不同组合。谢谢。仍然认为我只是错过了smth。
-
... 笏。谈论破碎。
标签: python html css-selectors beautifulsoup html-parsing