无法使用 xpath 从 href 标记中提取文本答案

【问题标题】：unable to extract text from href tag using xpath无法使用 xpath 从 href 标记中提取文本
【发布时间】：2020-12-17 15:20:57
【问题描述】：

我正在尝试使用以下 xpath 从this page 中提取趋势名称

//div[@class ='table-responsive']/table[@class = 'table table-striped table-hover dataTable no-footer']/tbody/tr/th/a/text()

在网络浏览器上尝试时会给出 50 个结果。但是使用以下代码

import requests
import lxml.html

html = requests.get('https://twitter-trends.iamrohit.in/')
doc = lxml.html.fromstring(html.content)
trends_name = doc.xpath("//div[@class = 'table-responsive']/table[@class = 'table table-striped table-hover dataTable no-footer']/tbody/tr/th/a/text()")

我在trends_name 变量中一无所获。我试图打印 html.content 并提供原始 html 内容。此外，我使用同一页面的源代码在在线 xapth 选择器上尝试了相同的 xpath，它给出了 50 个趋势我不确定我在代码上做错了什么，因为我已经在其他具有不同 xpath 的站点上尝试过它并且它正在工作，请帮助。谢谢

【问题讨论】：

标签： python xpath lxml

【解决方案1】：

只需从 table 的谓词中删除 "dataTable" 和 "no-footer" 类名 - 这些类名是在浏览器中呈现表格时添加的，但在页面源中不存在：

trends_name = doc.xpath("//div[@class = 'table-responsive']/table[@class = 'table table-striped table-hover']/tbody/tr/th/a/text()")

【讨论】：