【发布时间】:2021-07-27 00:40:50
【问题描述】:
当我使用第三方 aiobotocore 时,它可以达到 NUM_WORKERS=500,如果我想达到 1000,我会收到此错误:
r, w, _ = self._select(self._readers, self._writers, [], timeout)
File ".....\lib\selectors.py", line 314, in _select
r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()
如果有办法并行执行1000?
来源:
import os, sys, time, json
import asyncio
from itertools import chain
from typing import List
import logging
from functools import partial
from pprint import pprint
# Third Party
import asyncpool
import aiobotocore.session
import aiobotocore.config
_NUM_WORKERS=500
async def execute_lambda( lambda_name: str, key: str, client):
# Get json content from s3 object
if 1:
name=lambda_name
response = await client.invoke(
InvocationType='RequestResponse',
FunctionName=name,
LogType='Tail',
Payload=json.dumps({
'exec_id':key,
})
)
out=[]
async for event in response['Payload']:
out.append(event.decode())
#await asyncio.sleep(1)
return out
async def submit(lambda_name: str) -> List[dict]:
"""
Returns list of AWS Lambda outputs executed in parallel
:param name: name of lambda function
:return: list of lambda returns
"""
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
session = aiobotocore.session.AioSession()
config = aiobotocore.config.AioConfig(max_pool_connections=_NUM_WORKERS)
contents = []
#client = boto3.client('lambda', region_name='us-west-2')
async with session.create_client('lambda', region_name='us-west-2', config=config) as client:
worker_co = partial(execute_lambda, lambda_name)
async with asyncpool.AsyncPool(None, _NUM_WORKERS, 'lambda_work_queue', logger, worker_co,
return_futures=True, raise_on_join=True, log_every_n=10) as work_pool:
for x in range(_NUM_WORKERS):
contents.append(await work_pool.push(x, client))
# retrieve results from futures
contents = [c.result() for c in contents]
return list(chain.from_iterable(contents))
def main(name, files):
s = time.perf_counter()
_loop = asyncio.get_event_loop()
_result = _loop.run_until_complete(submit(name))
pprint(_result)
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
Lambda 函数:
import time
def lambda_handler(event, context):
time.sleep(10)
return {'code':0, 'exec_id':event['exec_id']}
结果:
'{"code": 0, "exec_id": 0}',
'{"code": 0, "exec_id": 1}',
'{"code": 0, "exec_id": 2}',
'{"code": 0, "exec_id": 3}',
...
'{"code": 0, "exec_id": 496}',
'{"code": 0, "exec_id": 497}',
'{"code": 0, "exec_id": 498}',
'{"code": 0, "exec_id": 499}']
my_cli_script.py executed in 14.56 seconds.
【问题讨论】:
-
你的意思是异步的吗?
-
@Evert,不,我想在我的本地 cli 脚本中处理结果
-
异步通常表示“并行”,同步表示“一个接一个”
-
@Evert 我必须是 Lambda 术语中的“同步” - 意味着我想等待 Lambda 结果与“异步”当我不这样做时。但是每个 exec 在 Python 术语中都必须是异步的。
-
我编辑了你的问题标题以澄清一点。
标签: python-3.x amazon-web-services aws-lambda boto3 python-asyncio