在爬虫中使用单线程异步协程，包含单任务和多任务，以及数据解析使用回调函数

`aiohttp`简介

aiohttp可以实现单线程并发IO操作，用他来代替非异步模块request来发送请求，请求中的ua,headers,和参数都可以添加，添加方法如下：

环境安装

pip install aiohttp

aiohttp使用

1.发起请求

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.baidu.com') as resposne:
            print(await resposne.text())

loop = asyncio.get_event_loop()
tasks = [fetch(),]
loop.run_until_complete(asyncio.wait(tasks))

2.添加请求参数的方法：

params = {'key': 'value', 'page': 10}
async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.baidu.com/s',params=params) as resposne:
            print(await resposne.url)

loop = asyncio.get_event_loop()
tasks = [fetch(),]
loop.run_until_complete(asyncio.wait(tasks))

3.UA伪装的添加方法：

url = 'http://httpbin.org/user-agent'
headers = {'User-Agent': 'test_user_agent'}

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(url,headers=headers) as resposne:
            print(await resposne.text())

loop = asyncio.get_event_loop()
tasks = [fetch(),]
loop.run_until_complete(asyncio.wait(tasks))

4.自定义cookies的方法：

url = 'http://httpbin.org/cookies'
cookies = {'cookies_name': 'test_cookies'}

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(url,cookies=cookies) as resposne:
            print(await resposne.text())
          

loop = asyncio.get_event_loop()
tasks = [fetch(),]
loop.run_until_complete(asyncio.wait(tasks))

5.post请求参数

url = 'http://httpbin.org'
payload = {'username': 'zhang', 'password': '123456'}
async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.post(url, data=payload) as resposne:
            print(await resposne.text())

loop = asyncio.get_event_loop()
tasks = [fetch(), ]
loop.run_until_complete(asyncio.wait(tasks))

6.设置代理

url = "http://python.org"
async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(url, proxy="http://some.proxy.com") as resposne:
        print(resposne.status)

loop = asyncio.get_event_loop()
tasks = [fetch(), ]
loop.run_until_complete(asyncio.wait(tasks))

异步IO处理

# 环境安装：pip install aiohttp
# 使用该模块中的ClientSession
import requests
import asyncio
import time
import aiohttp

start = time.time()
urls = [
    'http://127.0.0.1:5000/tiger','http://127.0.0.1:5000/jay','http://127.0.0.1:5000/tom',
    'http://127.0.0.1:5000/tiger', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom',
    'http://127.0.0.1:5000/tiger', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom',
    'http://127.0.0.1:5000/tiger', 'http://127.0.0.1:5000/jay', 'http://127.0.0.1:5000/tom',

]

async def get_page(url):
    async with aiohttp.ClientSession() as session:
        #get()、post():
        #headers,params/data,proxy='http://ip:port'
        async with await session.get(url) as response:
            #text()返回字符串形式的响应数据
            #read()返回的二进制形式的响应数据
            #json()返回的就是json对象
            #注意：获取响应数据操作之前一定要使用await进行手动挂起
            page_text = await response.text()
            print(page_text)

tasks = []

for url in urls:
    c = get_page(url)
    task = asyncio.ensure_future(c)
    tasks.append(task)

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

end = time.time()

print('总耗时:',end-start)

# 使用aiohttp替代requests模块
import time
import asyncio
import aiohttp

async def get_page(url):
    async with aiohttp.ClientSession() as session:
        # 只要有耗时就会有阻塞，就得使用await进行挂起操作
        async with await session.get(url=url) as response:
            page_text = await response.text() # 二进制read()/json()
            print('响应数据', page_text)

start = time.time()
urls = [
    'http://127.0.0.1:5000/tiger',
    'http://127.0.0.1:5000/jay',
    'http://127.0.0.1:5000/tom',
]
loop = asyncio.get_event_loop()

tasks = []
for url in urls:
    cone = get_page(url)
    task = asyncio.ensure_future(cone)
    tasks.append(task)

loop.run_until_complete(asyncio.wait(tasks))
print('总耗时: ', time.time()-start)

支持异步的网络请求模块: aiohttp

aiohttp简介

异步IO处理

`aiohttp`简介