【发布时间】:2017-07-02 09:06:02
【问题描述】:
我是scrapy,python的初学者。我尝试在scrapinghub中部署spider代码,遇到如下错误。下面是代码。
import scrapy
from bs4 import BeautifulSoup,SoupStrainer
import urllib2
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
import re
import pkgutil
from pkg_resources import resource_string
from tues1402.items import Tues1402Item
data = pkgutil.get_data("tues1402","resources/urllist.txt")
class SpiderTuesday (scrapy.Spider):
name = 'tuesday'
self.start_urls = [url.strip() for url in data]
def parse(self, response):
story = Tues1402Item()
story['url'] = response.url
story['title'] = response.xpath("//title/text()").extract()
return story
是我的 spider.py 代码
import scrapy
class Tues1402Item(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
url = scrapy.Field()
是items.py代码和
from setuptools import setup, find_packages
setup(
name = 'tues1402',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = tues1402.settings']},
package_data = {'tues1402':['resources/urllist.txt']},
zip_safe = False,
)
是 setup.py 代码。
以下是错误。
Traceback(最近一次调用最后一次): _next_request 中的文件“/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py”,第 126 行 请求 = 下一个(slot.start_requests) 文件“/usr/local/lib/python2.7/site-packages/scrapy/spiders/init.py”,第 70 行,在 start_requests 产生 self.make_requests_from_url(url) 文件“/usr/local/lib/python2.7/site-packages/scrapy/spiders/init.py”,第 73 行,在 make_requests_from_url 返回请求(url,dont_filter=True) init 中的文件“/usr/local/lib/python2.7/site-packages/scrapy/http/request/init.py”,第 25 行 self._set_url(url) _set_url 中的文件“/usr/local/lib/python2.7/site-packages/scrapy/http/request/init.py”,第 57 行 raise ValueError('请求 url 中缺少方案:%s' % self._url) ValueError:请求网址中缺少方案:h
提前谢谢你
【问题讨论】:
标签: python-2.7 scrapy scrapinghub