【问题标题】:how to use asyncio to read Json files from s3?如何使用 asyncio 从 s3 读取 Json 文件?
【发布时间】:2021-03-09 20:21:08
【问题描述】:

全部。我有以下代码读取并返回 s3 中 JSON 文件的一些值。此代码使用多线程。我的问题是如何修改它以使用 asyncio 来代替

def get_keys_from_prefix(bucket, prefix):
    """
    function to get key from S3 and return a list of keys
    """
    keys_list = []
    paginator = s3.meta.client.get_paginator('list_objects_v2')
    for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
        keys = [content['Key'] for content in page.get('Contents')]
        for obj in keys:
            if obj.endswith('.json'):
                keys_list.append(obj)
    return keys_list

def read_json_file_from_s3(bucket, key):
    """
    function to read content of Json file and print location
    """
    try:
        obj = boto3.client('s3').get_object(Bucket=bucket, Key=key)
        data = obj['Body'].read().decode('utf-8')
        json_content = json.loads(data)
        Info = json_content['info']
        location = Info.get("location")
        print (key)
        print (location)
    except:
        pass
def multithreading ():
    bucket = "bucket-name"
    prefix = "prefix"
    start = time.perf_counter()

    key_list = get_keys_from_prefix(bucket, prefix)
    with ThreadPoolExecutor() as executor:
        executor.map(read_json_file_from_s3, repeat(bucket), key_list)
        executor.shutdown(wait=True)

    finish = time.perf_counter()
    print(f'Finished in {round(finish - start, 2)} second(s)')

multithreading ()

【问题讨论】:

  • StackOverflow 不是代码编写服务。请展示您自己解决问题的尝试,我们将帮助您解决遇到的问题。

标签: python-3.x multithreading asynchronous amazon-s3 python-asyncio


【解决方案1】:

boto3 客户端使用阻塞函数。如果你想将它与 asyncio 模块一起使用,你可以使用 ThreadPoolExecutor 来完成。

例如:

# blocking function
def test():
    time.sleep()
    return 10

async def main():
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, test)

asyncio.run(main())

【讨论】:

    猜你喜欢
    • 2017-04-21
    • 1970-01-01
    • 2020-12-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-01-01
    • 2017-06-27
    相关资源
    最近更新 更多