Scrapy 选择器：获取元素的第 n 个子文本答案

【问题标题】：Scrapy selector: get nth-child text of an elementScrapy 选择器：获取元素的第 n 个子文本
【发布时间】：2020-04-24 12:20:04
【问题描述】：

我正在使用 Scrapy 选择器从 html 中提取字段

xpath = /html/body/path/to/element/text()

这类似于问题scrapy get nth-child text of same class 并按照文档，我们可以使用.getall() 方法获取所有元素并从列表中选择特定的元素。

selected_list = Selector(text=soup.prettify()).xpath(xpath).getall()

是否可以在 xpath 本身中直接指定要选择的第 n 个元素？如下所示

xpath = /html/body/path/to/element/text(2) #to select 3 child text

例子

<body>
  <div>
    <i class="ent_sprite remind_icon">
    </i> 
    text that needs to be
  </div>
</body>

response.xpath('/body/div/text()').getall() 的结果由 2 个元素组成

【问题讨论】：

你可以做.../element[3]/text()
<div><i class="ent_sprite remind_icon"></i> text that needs to be selected</div> @Piron 问题是我想提取第二个文本元素而不是第二个元素
你能把 XML 吗？第一个和第二个文本元素。
@Piron 添加了有问题的示例

【解决方案1】：

您可以使用following-sibling:: 来获得表达式的最近兄弟（向下）。例如，在这种情况下，您想要最近的 text() 标记 <i>，所以您可以这样做：

response.xpath('//i[@class="ent_sprite remind_icon"]/following-sibling::text()').get()

这会为您提供最接近 <i class="ent_sprite remind_icon"> 的 text() 节点。如果您想获取节点的第 n 个最近的兄弟（向下），则 XPath 将是 following-sibling::node[n] 在我们的例子中如下：

'//i[@class="ent_sprite remind_icon"]/following-sibling::text()[n]'

【讨论】：