【发布时间】:2021-08-26 22:54:54
【问题描述】:
查看一些 Azure Log Analytics 日志,我发现每次我的 Python Azure 函数从 Azure 存储下载 blob 时,都会有一个初始的 32MB 块,然后所有后续的 GetBlob 操作都是 4MB 块。
我怎样才能增加这个数字来减少我的函数执行的时间?
从存储下载 blob 的 Python 示例(Azure 函数):
def load_blob_to_memory(blob_client):
blob_data = blob_client.download_blob().readall()
blob_bytes = io.BytesIO(blob_data)
return blob_bytes
显示 ResponseBodySize 的示例 Log Analytics:
- 查询:
//==================================================//
// Assign variables
//==================================================//
let varStart = ago(2d);
let varEnd = now();
let varStorageAccount = 'stgtest';
let varIngressContainerName = 'cont-test';
let varFileName = 'test.csv';
let varSep = '/';
let varSampleUploadUri = strcat('https://', varStorageAccount, '.dfs.core.windows.net', varSep, varIngressContainerName, varSep, varFileName);
let varSampleDownloadUri = replace(@'%2F', @'/', replace(@'.dfs.', @'.blob.', tostring(varSampleUploadUri)));
//==================================================//
// Filter table
//==================================================//
StorageBlobLogs
| where TimeGenerated between (varStart .. varEnd)
and AccountName == varStorageAccount
//and StatusText == varStatus
and split(Uri, '?')[0] == varSampleUploadUri
or split(Uri, '?')[0] == varSampleDownloadUri
| summarize
count() by OperationName,
TimeGenerated,
UserAgent = tostring(split(UserAgentHeader, '(')[0]),
FileName = tostring(split(tostring(parse_url(url_decode(Uri))['Path']), '/')[-1]),
DownloadChunkSize = format_bytes(ResponseBodySize, 2, 'MB'),
StatusCode,
StatusText
| order by TimeGenerated asc
- 输出:
6/9/2021, 6:24:22.226 PM GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10 test.csv 32 MB 206 Success 1
6/9/2021, 6:24:22.442 PM GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10 test.csv 4 MB 206 Success 1
6/9/2021, 6:24:22.642 PM GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10 test.csv 4 MB 206 Success 1
6/9/2021, 6:24:22.780 PM GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10 test.csv 4 MB 206 Success 1
BlobClient 类的 download_blob() 方法有一个 max_concurrency parameter,但我不确定它是否需要完全异步/等待代码重写。
编辑 1: 谢谢@Guarav。这将默认值增加到32MB。
def create_blob_client(credentials):
blob_client = BlobClient.from_blob_url(
event.get_json()["blobUrl"],
credentials,
max_single_get_size = 64*1024*1024,
max_chunk_get_size = 32*1024*1024
)
return blob_client
【问题讨论】:
标签: python concurrency azure-functions azure-blob-storage