如何使用漂亮的汤从 HTML 中提取带有 ::marker 的标签答案

【问题标题】：How do I extract tags with ::marker from HTML using beautiful soup如何使用漂亮的汤从 HTML 中提取带有 ::marker 的标签
【发布时间】：2021-12-24 09:14:37
【问题描述】：

我正在尝试使用 BeautifulSoup 查找具有 ::marker 的 li 元素，如下所示。

我尝试使用 cssutils 但不成功（可能我用错了）

伪代码：

lis = soup_obj.find_all("li")
for li in lis:
  if li (has :: marker): # This is what I would like to do 
     print(li)

提前感谢您的帮助。

【问题讨论】：

这个有实际的网址吗？另外，请通过edit 使用 sn-p 工具插入实际的 html 而不是图像。

标签： python html beautifulsoup

【解决方案1】：

AFAIK 你将无法使用 cssutils 作为伪元素 ::marker 不支持语法。根据https://pythonhosted.org/cssutils/README.html#overview 下

Selectors

这里定义的选择器语法（而不是在 CSS 2.1 中）应该是可以用 cssutils 解析（不过应该介意；））

您看到的支持应该是（在第 7 节中）：

7.1.The ::first-line pseudo-element
7.2.The ::first-letter pseudo-element
7.4.The ::before and ::after pseudo-elements

另外，no pseudo elements are supported by Soup Sieve:

Soup Sieve 旨在允许用户使用 CSS 定位 XML/HTML 元素选择器。它实现了许多伪类，但它没有目前实现了任何伪元素，并且没有计划这样做。

也许相关的是您是否可以尝试通过 Selenium 自动化使用 JavaScript。注意的讨论是这里的cmets：

How to select and manipulate specific <li> elements present with pseudo-elements such as ::marker, ::before using javascript。尽管该讨论是关于 getComputedStyle 的，这与您的最终目标不同。

一个示例页面，如果其他人想玩，是这个页面底部的项目符号：

https://webdesign.tutsplus.com/tutorials/next-level-list-bullets-with-css-marker--cms-37212

只要该链接仍然有效。

您可能使用另一种模式来定位所需的节点。因此，如果该链接公开且可免费访问，则共享该链接会有所帮助。

【讨论】：