【发布时间】:2020-02-06 04:47:11
【问题描述】:
我在 python 中的“start_requests”函数中有一个问题。我正在使用代理和端口从另一个站点抓取数据。但我得到了:
[scrapy.extensions.logstats] 信息:抓取 1 页(0 页/分钟),抓取 0 项(0 项/分钟) [scrapy.downloadermiddlewares.retry] 调试:重试http://....../>(失败2次):TCP连接超时:110:连接超时。
我的代码是:
def get_proxy(self):
self.conn = MySQLdb.connect(
settings['MYSQL_HOST'],
settings['MYSQL_USER'],
settings['MYSQL_PASSWD'],
settings['MYSQL_DBNAME'],
charset = "utf8", use_unicode = True)
self.cursor = self.conn.cursor()
try:
results = self.cursor.execute("SELECT proxy, port FROM geme_proxies WHERE is_active = '1' AND is_deleted = '0' ORDER BY RAND() LIMIT 1" )
if results > 0:
row = self.cursor.fetchone()
return row
else:
return
except Exception, e:
logger.error('Exception Message: '+ str(e))
def start_requests(self):
proxy_data = self.get_proxy();
urls = [settings['OBERWIL_NEWS_URL']]
for url in urls:
request = scrapy.Request(url = url, callback = self.parse)
request.meta['proxy'] = 'http://' + proxy_data[0] + ':' + proxy_data[1]
proxy_user_pass = settings['PROXY_USERNAME'] + ':' + settings['PROXY_PASSWORD']
encoded_user_pass = base64.encodestring(proxy_user_pass)
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
yield request
请帮我解决这个问题。
【问题讨论】:
标签: python-3.x