scrapy 获取包括孩子在内的整个文本答案

【问题标题】：scrapy get the entire text including childrenscrapy 获取包括孩子在内的整个文本
【发布时间】：2014-12-21 06:58:27
【问题描述】：

我正在用 scrapy 抓取的文档中有一系列  元素。
其中一些是： bla bla bla 要么 bla bla blasecond bla bla

我想提取所有带有孩子的文本（假设我已经有了<p的选择器）
（第二个例子：有一个字符串bla bla bla second bla bla）

【问题讨论】：

【解决方案1】：

这里有 2 个选项，视情况而定，各有各的好处。

html 示例

<p>Something outside the span<span> and something inside the span</span></p>

选项01：使用//text() -> 返回列表

response.xpath('//p//text()').getall()

# returns
>>> ['Something outside the span', ' and something inside the span']

选项02：使用string()->返回字符串

response.xpath('string(//p)').get()

# returns
>>> 'Something outside the span and something inside the span'

【讨论】：

【解决方案2】：

您可以只使用//text() 从子节点中提取所有文本

例如：

.//p//text()

【讨论】：