使用 pickle 或 dill 从 Azure blob 存储读取文件而不保存到磁盘答案

【问题标题】：Reading file from Azure blob storage using pickle or dill without saving to disk使用 pickle 或 dill 从 Azure blob 存储读取文件而不保存到磁盘
【发布时间】：2020-10-05 09:38:17
【问题描述】：

我正在尝试从 Python 中的 Azure 存储 Blob 读取机器学习模型的权重。这应该在 Azure Functions 中运行，所以我不相信我能够使用将 blob 保存到磁盘的方法。

我使用的是 azure-storage-blob 12.5.0，而不是旧版本。

我尝试过使用 Dill.loads 来加载 .pkl 文件，如下所示：

connection_string = 'my_connection_string'
blob_client = BlobClient.from_connection_string(connection_string, container_name, blob_name)
downloader = blob_client.download_blob(0)

with BytesIO() as f:
    downloader.readinto(f)
    weights = dill.loads(f)

>>> TypeError: a bytes-like object is required, not '_io.BytesIO'

我不确定使用 Pickle 的方法会如何。怎么解决？

【问题讨论】：

标签： azure-functions azure-blob-storage pickle dill

【解决方案1】：

这个问题的解决方法如下：

def get_weights_blob(blob_name):
    connection_string = 'my_connection_string'
    blob_client = BlobClient.from_connection_string(connection_string, container_name, blob_name)
    downloader = blob_client.download_blob(0)

    # Load to pickle
    b = downloader.readall()
    weights = pickle.loads(b)

    return weights

然后使用函数检索权重：

weights = get_weights_blob(blob_name = 'myPickleFile')

【讨论】：

谢谢！为我节省了很多迭代:)

【解决方案2】：

这是我的工作示例

def main(req: func.HttpRequest) -> func.HttpResponse:

 connection_string = ''
    blob_client = BlobClient.from_connection_string(connection_string, 'blog-storage-containe', 'blobfile')
    downloader = blob_client.download_blob(0)

b = downloader.readall()
loaded_model = pickle.loads(b)

还有 requirements.txt 文件

azure-functions
numpy
joblib
azure-storage-blob
sklearn

【讨论】：

如果这应该是 Python 代码，那么它不是有效的 Python 代码（缩进）。