【问题标题】:Extracting multiple zipped JSON files in-memory and saving them to Azure Blob Storage with Python在内存中提取多个压缩 JSON 文件并使用 Python 将它们保存到 Azure Blob 存储
【发布时间】:2021-06-13 18:21:33
【问题描述】:

我有一个与 SOAP API 通信、接收二进制文件并最终提取少量 JSON 文件的函数,我希望使用 Python 将这些文件保存到 Azure Blob 存储容器中。

Microsoft 官方文档和示例对于保存单个文件很有用,但是当我尝试对多个文件执行相同操作时,我收到错误代码:

TypeError:Blob 数据应为字节类型。

代码单元和错误代码见下文。

# Extract Pre Survey JSON responses from binaries and send to Azure Blob storage:

import os
import io, zipfile
from io import BytesIO
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
from functools import reduce

blob = BlobClient.from_connection_string(conn_str="connection string", container_name="container name", blob_name=name)

local_path = "./temp"

def temp_extract():
  for i in binaries: ---> N.B from previous cell.  
    with zipfile.ZipFile(io.BytesIO(i)) as zfile: 
      for name in zfile.namelist():
        if name.endswith('.json'):
          zfile.extract(name, local_path)

def  upload_blobs():
  upload_file_path = os.path.join(local_path, name)
  onlyfiles = reduce(lambda x,y : x+y, [map(lambda x: root + "/" + x, files) for root, dirs, files in os.walk(local_path)])
  onlyfiles = [file for file in onlyfiles if file.endswith('.json')]
  for file in onlyfiles:
      print(os.path.getsize(file))
      with open(file, 'r') as f:
        blob.upload_blob(data = f, overwrite=True)

if __name__ == '__main__':
    temp_extract()
    upload_blobs()

我收到以下错误代码:

TypeError                                 Traceback (most recent call last)
<ipython-input-273-fb49f396fab5> in <module>
     26 if __name__ == '__main__':
     27     temp_extract()
---> 28     upload_blobs()
     29 

<ipython-input-273-fb49f396fab5> in upload_blobs()
     22       print(os.path.getsize(file))
     23       with open(file, 'r') as f:
---> 24         blob.upload_blob(data = f, overwrite=True)
     25 
     26 if __name__ == '__main__':

~/Library/Python/3.7/lib/python/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
     81             span_impl_type = settings.tracing_implementation()
     82             if span_impl_type is None:
---> 83                 return func(*args, **kwargs)
     84 
     85             # Merge span is parameter is set, but only if no explicit parent are passed

~/Library/Python/3.7/lib/python/site-packages/azure/storage/blob/_blob_client.py in upload_blob(self, data, blob_type, length, metadata, **kwargs)
    683             **kwargs)
    684         if blob_type == BlobType.BlockBlob:
--> 685             return upload_block_blob(**options)
    686         if blob_type == BlobType.PageBlob:
    687             return upload_page_blob(**options)

~/Library/Python/3.7/lib/python/site-packages/azure/storage/blob/_upload_helpers.py in upload_block_blob(client, data, stream, length, overwrite, headers, validate_content, max_concurrency, blob_settings, encryption_options, **kwargs)
     86                 data = data.read(length)
     87                 if not isinstance(data, six.binary_type):
---> 88                     raise TypeError('Blob data should be of type bytes.')
     89             except AttributeError:
     90                 pass

**TypeError: Blob data should be of type bytes.**

【问题讨论】:

    标签: python azure azure-storage azure-blob-storage


    【解决方案1】:

    您收到此错误的原因是您将文件对象作为数据传递给 upload_blob 方法,而该方法需要内容。

    你要做的是读取文件的内容,然后将文件内容传递给upload_blob方法。

    类似:

    with open(file, 'r') as f:
        file_content = f.read()  
        blob.upload_blob(data = file_content, overwrite=True)
    

    【讨论】:

    • 感谢您的帮助。这消除了错误代码并实际上将某些内容上传到容器,但由于某种原因,它只保存了三个文件中的一个,并且使用了不正确的文件格式和文件名 - 在上一步(从 SOAP API 中提取数据)我获取三个 json 文件和三个描述上述 json 文件的 txt 文件。我得到了blob中的最后一个json文件,但以使用“onlyfiles = [file for file in onlyfiles if file.endswith('.json')]”过滤掉的一个txt文件命名
    • 可能是因为您设置了 Blob 的固定名称:blob = BlobClient.from_connection_string(conn_str="connection string", container_name="container name", blob_name=name)?您可以尝试在调用 upload_blob 方法之前创建一个 BlobClient 实例,看看是否能解决您的问题?
    • 完美!我很高兴听到这个消息。
    猜你喜欢
    • 1970-01-01
    • 2022-01-18
    • 2021-11-19
    • 1970-01-01
    • 2020-08-25
    • 1970-01-01
    • 2012-03-22
    • 1970-01-01
    • 2021-09-20
    相关资源
    最近更新 更多