使用css选择器在scrapy爬取中提取跨度之外的文本答案

【问题标题】：Using css selector to extract text outside of span in scrapy crawling使用css选择器在scrapy爬取中提取跨度之外的文本
【发布时间】：2018-09-30 20:17:36
【问题描述】：

我有以下html代码：

    <h1>
        <a href="https://www.google.com">
            <span>448587: </span>Brian McMills
        </a>
    </h1>

我只对Brian McMills 感兴趣。我想使用scrapy css selector 函数来选择文本。

当我使用h1 a ::text 时，它只选择448587: 部分，我尝试了:not(span) 的一些组合，但它不起作用。

注意：我对Xpath 或scripting 解决方案不感兴趣，只对css 感兴趣。

【问题讨论】：

试试看这里w3schools.com/cssref/css_selectors.asph1 a:not(span)必须工作。你试过这样写吗？
我必须添加 h1 a:not(span)::text 才能使其工作。谢谢

标签： python html scrapy css-selectors scrapy-spider

【解决方案1】：

唯一有效的是h1 a:not(span)::text。

谢谢。

【讨论】：