Scrapy CSS/XPath 选择器答案

【问题标题】：Scrapy CSS/XPath SelectorsScrapy CSS/XPath 选择器
【发布时间】：2020-03-31 21:15:41
【问题描述】：

我正在尝试在下面的this page 示例中捕获许多 HTML 元素的文本：

<p dir="ltr" data-test-bidi="">I’ve really loved using this app...</p>

<time data-test-customer-review-date="" datetime="2019-07-21T22:32:44.000Z" aria-label="July 21, 2019" class="we-customer-review__date">07/21/2019</time>

我尝试了几种不同的选择器：

response.xpath('//time[@class="we-customer-review__date"]/text()')

response.css('p').attrib['dir']

response.xpath('//p/text()').getall() 返回所有 <p> 元素，但我无法过滤到我想要的元素。有什么建议吗？

【问题讨论】：

这会返回一个错误 什么错误？ 但我无法过滤到我想要的。 这是什么意思？请提供minimal reproducible example，以及完整的错误消息，并查看How to Ask、help center。

标签： python xpath scrapy css-selectors

【解决方案1】：

您可以执行以下操作：

ptags = response.css('p') #selects all the p tabs
texts =[]
for ptag in ptags:
    if ptag.css('p::attr(dir)').get() is not None: #selects the ones that have dir prop
         text = ptag.css('p::text').get() #extracts the text
         texts.append(text) #append to the list or whatever

【讨论】：