【发布时间】:2017-05-10 16:42:50
【问题描述】:
我将输出作为连续数据在行中而不是以正确的记录格式显示(每行一条记录)。这是我的代码:
import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
class famousPeopleItem(scrapy.Item):
# define the fields for your item here like:
Name = scrapy.Field()
Profession = scrapy.Field()
Birth_Date = scrapy.Field()
Birth_Place = scrapy.Field()
Nationality = scrapy.Field()
Died_On = scrapy.Field()
# item class included here
class famousPeople(CrawlSpider):
name = 'famous'
start_urls = [
'http://www.thefamouspeople.com/famous-people-by-zodiac-sign.php'
]
custom_settings = {
'DEPTH_LIMIT': '1',
}
rules = (
Rule(LinkExtractor(restrict_xpaths=
('//div[@class="table_list"]//a',)),callback='parse_item',follow=True),
)
def parse_item(self, response):
item = famousPeopleItem()
item["Name"] =
response.xpath('//div[@class="section"]//a[2]//text()').extract()
item["Profession"] =
response.xpath('//div[@class="section"]//span//text()').extract()
item["Birth_Date"] =
response.xpath('//div[@class="section"]//p[1]//text()').extract()
item["Birth_Place"] =
response.xpath('//div[@class="section"]//p[2]//text()').extract()
item["Nationality"] =
response.xpath('//div[@class="section"]//p[3]//text()').extract()
item["Died_On"] =
response.xpath('//div[@class="section"]//p[4]//text()').extract()
yield (item)
虽然 extract_first() 有助于以正确的格式提供数据,但它不会获取所有记录。
【问题讨论】:
-
你想如何显示你的输出?
-
在csv文件中,每行一条记录。
标签: python-2.7 web-scraping scrapy