【问题标题】:AZURE Blob-File Storage // Shared access signature // PythonAZURE Blob 文件存储 // 共享访问签名 // Python
【发布时间】:2022-02-10 19:38:28
【问题描述】:

我在 AZURE Blob/文件存储环境中担任 Python 开发人员。这些是我的凭据

存储类型: 存储帐户

访问规则: 共享访问签名

可用凭证:

  • 连接字符串
  • SAS 令牌
  • Blob 服务 SAS 网址
  • 文件服务 SAS URL

你能帮我找一个python例程来建立连接、列出和下载文件吗?

提前致谢!

【问题讨论】:

    标签: python azure azure-blob-storage


    【解决方案1】:

    find a python routine to establish a connection, list and download files?

    您可以使用 连接字符串 作为 python 例程来建立与 azure 存储帐户的连接并从 azure blob 存储下载文件。

    您必须使用 Azure python SDK 进行存储才能将存储容器中的所有 Blob 下载到指定的本地文件夹。

    以下示例代码将为使用虚拟文件夹名称(名称包含斜杠)的 blob 创建本地文件夹:

    # Python program to bulk download blob files from azure storage
    # Uses latest python SDK() for Azure blob storage
    # Requires python 3.6 or above
    import os
    from azure.storage.blob import BlobServiceClient, BlobClient
    from azure.storage.blob import ContentSettings, ContainerClient
    
    # IMPORTANT: Replace connection string with your storage account connection string
    # Usually starts with DefaultEndpointsProtocol=https;...
    MY_CONNECTION_STRING = "REPLACE_THIS"
    
    # Replace with blob container
    MY_BLOB_CONTAINER = "myimages"
    
    # Replace with the local folder where you want files to be downloaded
    LOCAL_BLOB_PATH = "REPLACE_THIS"
    
    class AzureBlobFileDownloader:
     def __init__(self):
       print("Intializing AzureBlobFileDownloader")
    
       # Initialize the connection to Azure storage account
       self.blob_service_client =  BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
       self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
    
    
     def save_blob(self,file_name,file_content):
       # Get full path to the file
       download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
    
       # for nested blobs, create local path as well!
       os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
    
       with open(download_file_path, "wb") as file:
         file.write(file_content)
    
     def download_all_blobs_in_container(self):
       my_blobs = self.my_container.list_blobs()
       for blob in my_blobs:
         print(blob.name)
         bytes = self.my_container.get_blob_client(blob).download_blob().readall()
         self.save_blob(blob.name, bytes)
    
    # Initialize class and upload files
    azure_blob_file_downloader = AzureBlobFileDownloader()
    azure_blob_file_downloader.download_all_blobs_in_container()
    

    注意: 将 MY_CONNECTION_STRING、 LOCAL_BLOB_PATHMY_BLOB_CONTAINER 变量替换为您的值。

    更新

     
    #Place the same above code for connection string,local blob path and my blob container here 
    
     def download_all_blobs_in_container(self):
            # get a list of blobs
            my_blobs = self.my_container.list_blobs()
            result = self.run(my_blobs)
            print(result)
         
          def run(self,blobs):
            # Download 10 files at a time!
            with ThreadPool(processes=int(10)) as pool:
             return pool.map(self.save_blob_locally, blobs)
         
          def save_blob_locally(self,blob):
            file_name = blob.name
            print(file_name)
            bytes = self.my_container.get_blob_client(blob).download_blob().readall()
         
            # Get full path to the file
            download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
            # for nested blobs, create local path as well!
            os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
         
            with open(download_file_path, "wb") as file:
              file.write(bytes)
            return file_name
    
    # Initialize class and upload files
    azure_blob_file_downloader = AzureBlobFileDownloader()
    azure_blob_file_downloader.download_all_blobs_in_container()

    【讨论】:

    • 感谢这个好例子。我不确定下载,但它工作得很好。你知道我是否需要任何类型的流媒体库来处理非常大的 blob?
    • 您可以使用 python 的 ThreadPool 类来处理大量 blob 文件。我在答案中添加了使用 Threadpool 类的代码,请检查。
    • 太棒了!非常感谢
    猜你喜欢
    • 2022-10-21
    • 2017-08-24
    • 2014-02-08
    • 2020-09-04
    • 1970-01-01
    • 1970-01-01
    • 2013-06-28
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多