【发布时间】:2016-10-05 14:53:59
【问题描述】:
运行scrapy项目时出现此错误 我的 spider.py 代码是
import scrapy
import re
from tutorial.items import TutorialItem
class tutorialSpider(scrapy.Spider):
name="tutorial"
allowed_domain=['examble.com']
start_urls = ["examble.com/something"]
def parse(self, response):
for sel in response.xpath('//*[@id="post-entry"]/div/article'):
item = TutorialItem()
item['Title'] = sel.xpath('div[2]/h2/a/text()').extract[0]
item['MainPageUrl'] = sel.xpath('div[2]/h2/a/@href').extract[0]
item['Author'] = sel.xpath('div[2]/div/span/a/text()').extract[0]
request = scrapy.Request(item['MianPageUrl'], callback=self.parseContentDetails)
request.meta['item'] = item
yield request
def parseContentDetails(self,response):
item = response.meta['item']
item['Content'] = response.xpath()
item['Count'] = response.xpath()
print type(item)
return item
我的 pipeline.py 是
class TutorialPipeline(object):
def __init__(self):
#self.setupDBCon()
#self.createTables()
def process_item(self, item, spider):
for key, value in item.iteritems():
if(isinstance(value, list)):
if value:
templist = []
for obj in value:
temp = self.stripHTML(obj)
templist.append(temp)
item[key] = templist
else:
item[key] = ""
else:
item[key] = self.stripHTML(value)
print item.get('Title', '')
return item
我的 items.py 是
from scrapy.item import Item, Field
class TutorialItem(Item):
Title=Field()
Author = Field()
MianPageUrl = Field()
Content=Field()
Count=Field()
请告诉我这个错误的解决方案。我搜索了很多网站。该网站只告诉 instancemethod 对象在 django 中没有属性错误,但我想要 scrapy 的解决方案
【问题讨论】:
-
请发布您的回溯,之前的行(包括
TypeError行)
标签: python-2.7 web-scraping scrapy web-crawler scrapy-spider