【发布时间】:2018-01-18 03:26:59
【问题描述】:
我是 Python 的新手,我正在尝试用 Scrapy 对抗亚马逊。我正在尝试从
获取项目名称和计数这是我的 Python 代码:
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrapeamazon-css"
start_urls = [
"https://www.amazon.ca/gp/search/other/ref=sr_in_a_-2?rh=i%3Akitchen%2Cn%3A2206275011%2Cn%3A%212206276011%2Cn%3A2224068011%2Cn%3A6647367011%2Cn%3A6647368011&page=2&bbn=6647368011&pickerToList=lbr_brands_browse-bin&indexField=a&ie=UTF8&qid=1515436664",
]
def parse(self, response):
for item in response.css("span.a-list-item"):
yield {
"item_name": item.css("span.refinementLink::text").extract_first(),
"item_cnt": item.css("span.narrowValue::text").extract_first()
}
next_page_url = response.css("span.pagnLink > a::attr(href)").extract_first()
if next_page_url is not None:
yield scrapy.Request(response.urljoin(next_page_url))
我能够获得我想要的大部分数据,但我没有得到任何关于字母 D、E、I、J 的数据。知道我做错了什么吗?
【问题讨论】:
标签: python-3.x web-scraping scrapy