如何在 Python 中使用 Scrapy 抓取 url答案

【问题标题】：How to scrape url using Scrapy in Python如何在 Python 中使用 Scrapy 抓取 url
【发布时间】：2016-01-31 07:52:52
【问题描述】：

我想在 python 中使用 scrapy 从链接“http://presskr.com/category/Mobiles--Tablets/35”中提取整个产品 url。以下是我用来执行此操作的函数：

def parse(self, response):
    print("hello");

    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//div[@id="pagination_contents"]')
    items = []
    i=3
    for site in sites:
        item = DmozItem()
        item['link'] = site.select('div[2]/div['+str(i)+']/a/@href').extract()
        i=int(i)+1;
        print i
        items.append(item)
    return items

每个产品div的x-path为：//div[@id="pagination_contents"]/div[2]/div['+str(i)+']/a/@href

但我只得到一个链接，而不是所有产品的网址。

【问题讨论】：

标签： python scrapy

【解决方案1】：

请尝试以下操作。我建议遵循 Scrapy 指南，只需按照此操作即可，您不需要太多手动操作。你的例子很像：http://doc.scrapy.org/en/latest/intro/tutorial.html#extracting-the-data，所以请继续关注这个

def parse(self, response):
        for href in response.xpath('//span[@class ="itemlistinginfo"]/a/@href'):
            full_url = urljoin(href.extract())
            item = DmozItem()
            item['link'] = full_url
            yield item

【讨论】：

非常感谢Turo我明白了