【发布时间】:2020-08-13 03:21:17
【问题描述】:
假设我们有网页
<div class="specific-row" data-id="101736782"></div>
<div class="yellow-box-row" data-id="112376244"></div>
<div class="specific-row" data-id="179218312"></div>
<div class="vip-row" data-id="123749014"></div>
如何获取所有 data-id 值?
赞['101736782', '112376244', '179218312', '123749014']
我使用了tree.xpath
import requests
from lxml import html
r = requests.get(url)
tree = html.fromstring(r.content)
tree.xpath("//div@data-id=['any']")
【问题讨论】:
-
XPath 2.0 解决方案将是
string-join(//@data-id,";")与适用于 Python 的 Saxon/C 处理器相结合。输出:101736782;112376244;179218312;123749014
标签: python html css xpath web-scraping