在单个节点scrapy xpath中获取包括html在内的所有文本答案

【问题标题】：Get all text including html in a single node scrapy xpath在单个节点scrapy xpath中获取包括html在内的所有文本
【发布时间】：2017-12-01 06:33:30
【问题描述】：

response.xpath('//*[@id="blah"]//text()')

假设我的 html 是

<p id="blah">This is a simple text <a href="#">foo</a> and this is after tag. </p>

发生了什么我得到了一个文本列表，即使它有一个<p> 标签。比如

[u'This is a simple text', u' and this is after tag.']

我的实际 html 内容非常庞大，我必须这样做 join 才能实现这一点。我也输了foo 而join。有没有具体的xpathscrapy 机制来做到这一点？

我想得到结果 这是一个简单的文本 foo，这是在标记之后。

请注意这里的foo。

谢谢

【问题讨论】：

【解决方案1】：

您可以将所有文本节点作为单个字符串获取，如下所示：

response.xpath('//*[@id="blah"]')[0].text_content()

输出：

'This is a simple text foo and this is after tag. '

【讨论】：

【解决方案2】：

如果是 xpath 2，你可以使用 string-join 函数

response.xpath('string-join(//*[@id="blah"]//text())')

【讨论】：