一、proxies代理

首先需要在环境变量中设置

from scrapy.contrib.downloadermiddleware.httpproxy import HttpProxyMiddleware

方式一:使用默认

os.environ
{
     http_proxy:http://root:woshiniba@192.168.11.11:9999/
     https_proxy:http://192.168.11.11:9999/
}

缺点:原生代理是把代理放在python环境变量里面,也就是要依赖于python环境变量,要用的时候然后再去变量里面搜索,一个个分割字符进行匹配,效率低,low。

方式二:使用自定义下载中间件

def to_bytes(text, encoding=None, errors='strict'):
        if isinstance(text, bytes):
            return text
        if not isinstance(text, six.string_types):
            raise TypeError('to_bytes must receive a unicode, str or bytes '
                            'object, got %s' % type(text).__name__)
        if encoding is None:
            encoding = 'utf-8'
        return text.encode(encoding, errors)

class ProxyMiddleware(object):
        def process_request(self, request, spider):
            PROXIES = [
                {'ip_port': '111.11.228.75:80', 'user_pass': ''},
                {'ip_port': '120.198.243.22:80', 'user_pass': ''},
                {'ip_port': '111.8.60.9:8123', 'user_pass': ''},
                {'ip_port': '101.71.27.120:80', 'user_pass': ''},
                {'ip_port': '122.96.59.104:80', 'user_pass': ''},
                {'ip_port': '122.224.249.122:8088', 'user_pass': ''},
            ]
            proxy = random.choice(PROXIES)
            if proxy['user_pass'] is not None:
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
                encoded_user_pass = base64.encodestring(to_bytes(proxy['user_pass']))
                request.headers['Proxy-Authorization'] = to_bytes('Basic ' + encoded_user_pass)
                print "**************ProxyMiddleware have pass************" + proxy['ip_port']
            else:
                print "**************ProxyMiddleware no pass************" + proxy['ip_port']
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
    
DOWNLOADER_MIDDLEWARES = {
    'step8_king.middlewares.ProxyMiddleware': 500,
}
自定义proxies

相关文章:

  • 2021-05-30
  • 2022-12-23
  • 2021-11-28
  • 2022-12-23
  • 2021-08-21
  • 2022-12-23
  • 2021-11-16
  • 2022-03-04
猜你喜欢
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-12-29
  • 2022-02-09
  • 2022-12-23
  • 2022-12-23
相关资源
相似解决方案