【发布时间】:2019-10-01 20:18:21
【问题描述】:
我有一些代码用于抓取 url,解析信息,然后使用 SQLAlchemy 将其放入数据库。我正在尝试异步执行此操作,同时限制同时请求的最大数量。
这是我的代码:
async def get_url(aiohttp_session, url1, url2):
async with session.get(url1) as r_url1:
if r_url1.status == 200:
async with session.get(url2) as r_url2:
if r_url2.status == 200:
return await r_url1.json(), await r_url2.json()
async def url_generator(formatted_start_date, formatted_end_date, machine_id, interval):
interval_start = formatted_start_date
interval_end = formatted_start_date + interval
while interval_end <= formatted_end_date:
yield (f"https://example.org/start={interval_start}"
f"Start={datetime.strftime(interval_start, DATETIME_FORMAT)}"
f"&End={datetime.strftime(interval_end, DATETIME_FORMAT)}"
f"&machines={machine_id}",
f"https://example.org/start={interval_start}"
f"Start={datetime.strftime(interval_start, DATETIME_FORMAT)}"
f"&End={datetime.strftime(interval_end, DATETIME_FORMAT)}"
f"&machines={machine_id}&groupby=Job"
)
interval_start += interval
interval_end += interval
async def parse(database, url1_json, url2_json):
""" Do some parsing and save it using credentials stored in the database object """
def main(database, formatted_start_date, formatted_end_date, machine_id, interval):
async for url1_json, url2_json in asyncio.as_completed(url_generator(formatted_start_date, formatted_end_date, machine_id, interval)):
parse(database, url1_json, url2_json)
我收到错误yield from should be used as context manager expression。
我已经尝试查看 documentation here 以及同步原语,但仍然对我出错的地方以及我应该如何从我的生成器创建任务感到困惑。
【问题讨论】:
标签: python python-asyncio python-3.7