【问题标题】:How to make this loop asynchronous?如何使这个循环异步?
【发布时间】:2017-11-21 17:05:09
【问题描述】:

我现在正在使用 Github API,这是一个获取列表中每个 repo 的所有拉取请求的函数:

async def get_all_pulls(repos, api):
    pulls = []
    for repo in repos:
        try:
            async for pull in api.getiter(f'/repos/{org}/{repo}/pulls?state=all'):
                pull['repo'] = repo
                if pull not in pulls:
                    pulls.append(pull)
        except Exception:
            print(f"Bad repo/no access=> [{repo}]")
            continue

    return pulls

一切正常,但有一个小问题,由于对 repos 的迭代需要很长时间(假设有 30 个)。

我试图让它像这样异步(确定我在使用它时在声明中摆脱了 for 循环):

        # gather all prs for all repos

        tasks = [asyncio.ensure_future(get_all_pulls_for_repo(api, repo)) for repo in repos]
        results = await asyncio.gather(*tasks)
        # unwrap list of lists
        for res in results:
            all_pull_requests += res

但我遇到崩溃并说 repos 不好等。 我想我在这里遗漏了一些重要的东西,但找不到什么。

为什么它会因异步 for 循环而崩溃?我可以让它工作吗?

更新1: get_all_reviews 的追溯:

Traceback (most recent call last):
  File "/home/metal/Documents/projects/-git/async_git_tool.py", line 193, in <module>
    loop.run_until_complete(main())
  File "/home/metal/.pyenv/versions/3.6.0/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/home/metal/Documents/projects/-git/async_git_tool.py", line 113, in main
    reviewed = await get_all_reviews(created, api, ss_programmers)
  File "/home/metal/Documents/projects/-git/async_git_tool.py", line 181, in get_all_reviews
    async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'):
  File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 85, in getiter
    data, more = await self._make_request("GET", url, url_vars, b"", accept)
  File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/abc.py", line 66, in _make_request
    data, self.rate_limit, more = sansio.decipher_response(*response)
  File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 284, in decipher_response
    rate_limit = RateLimit.from_http(headers)
  File "/home/metal/Documents/projects/-git/venv/lib/python3.6/site-packages/gidgethub/sansio.py", line 226, in from_http
    limit = int(headers["x-ratelimit-limit"])
  File "multidict/_multidict.pyx", line 140, in multidict._multidict._Base.__getitem__
  File "multidict/_multidict.pyx", line 135, in multidict._multidict._Base._getone
KeyError: "Key not found: 'x-ratelimit-limit'"

这是函数本身:

    async def get_all_reviews(pulls, api, programmers):
    reviewed_pulls = []
    for pull in pulls:
        pr_repo = pull['repo']
        pr_number = str(pull['number'])

        async for review in api.getiter(f'/repos/{org}/{pr_repo}/pulls/{pr_number}/reviews'):
            if review['user']['login'] not in programmers \
                    and pull not in reviewed_pulls:
                reviewed_pulls.append(pull)

    return reviewed_pulls

我这样称呼它:

reviewed = await get_all_reviews(softserve_created, api, ss_programmers)

【问题讨论】:

标签: python-3.x asynchronous python-asyncio


【解决方案1】:

你描述的想法对我很有效:

import asyncio
import aiohttp
import gidgethub
from gidgethub.aiohttp import GitHubAPI


# TODO 
# paste your token to have rate limits
# https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
TOKEN = '...'


async def get_all_pulls_for_repo(gh, org, repo):
    pulls = []
    async for pull in gh.getiter(f'/repos/{org}/{repo}/pulls?state=all'):
        pulls.append(pull)
        await gh.sleep(0.1)  # avoid RateLimitExceeded, you should count it somehow
    return pulls


async def main():
    org = 'brettcannon'
    repos = ['gidgethub', 'caniusepython3', 'importlib_resources']

    async with aiohttp.ClientSession() as session:
        gh = GitHubAPI(session, 'requester', oauth_token=TOKEN)
        tasks = [
            asyncio.ensure_future(get_all_pulls_for_repo(gh, org, repo))
            for repo 
            in repos
        ]
        results = await asyncio.gather(*tasks)

    for res in results:
        for pull in res:
            print(pull['url'])


loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.run_until_complete(loop.shutdown_asyncgens())
    loop.close()

为请求创建令牌,粘贴它,您将看到 PR url 列表。

【讨论】:

  • sleep() 很好地解决了 get_all_pulls 的问题,但我仍然在 get_all_reviews 有回溯,请查看顶部的更新问题,谢谢。
  • 另外我刚刚注意到,如果我在 repos 中使用非异步的 repo,我总共会收到 9983 个拉取请求,如果是异步的 - 它只有 7k
  • @Roman,你得到的错误与你的代码无关,这是 gidgethub 中的一个错误 - github.com/brettcannon/gidgethub/issues/25 我认为你检索到的 PR 数量也与 gidgethub 的工作方式有关。
猜你喜欢
  • 2018-07-29
  • 1970-01-01
  • 2023-01-20
  • 1970-01-01
  • 2016-02-27
  • 2018-10-17
  • 1970-01-01
  • 2021-12-12
  • 2015-04-04
相关资源
最近更新 更多