【发布时间】:2017-11-23 06:03:10
【问题描述】:
这是我的基本scrapy爬虫:
def parse(self, response):
item = CruiseItem()
item['Cruise'] = {}
item['Cruise']['Cruiseline'] = response.xpath('//title/text()').extract()
item['Cruise']['Itinerary'] = response.xpath('//*[@id="brochureName1"]/text()').extract()
item['Cruise']['Price'] = response.xpath('//*[@id="interiorPrice1"]/text()').extract()
item['Cruise']['PerNight'] = response.xpath('//*[@id="perNightinteriorPrice1"]/text()').extract()
return item
这非常适合提取我想要的所有正确元素。例如,我的 json 提要结果如下:
[
{
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"3 Night Bahamas ",
"4 Night Western Caribbean ",
"4 Night Bahamas ",
"3 Night Bahamas ",
"5 Night Western Caribbean ",
"5 Night Eastern Caribbean ",
"7 Night Western Caribbean ",
"7 Night Southern Caribbean ",
"6 Night Western Caribbean ",
"7 Night Western Caribbean ",
"8 Night Eastern Caribbean "
],
"Price": [
"$169",
"$179",
"$289",
"$349",
"$359",
"$389",
"$389",
"$409",
"$424",
"$524",
"$939"
],
"PerNight": [
"$56/night",
"$45/night",
"$72/night",
"$116/night",
"$72/night",
"$78/night",
"$56/night",
"$58/night",
"$71/night",
"$75/night",
"$117/night"
]
}
}
]
但是目标 json 输出不同:
[
{
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"3 Night Bahamas "
],
"Price": [
"$169"
],
"PerNight": [
"$56/night"
]
},
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"4 Night Bahamas "
],
"Price": [
"$79"
],
"PerNight": [
"$86/night"
]
}
}
]
基本上我想返回每条邮轮,每艘船、行程、价格和每晚只有 1 条。
这有意义吗?愿意讨论
编辑:几天前问过这个问题,但决定澄清并重新发布。谢谢!
【问题讨论】: