【问题标题】:How to access csv file from Google Cloud Storage in a Google Cloud Function via Pandas?如何通过 Pandas 在 Google Cloud Function 中从 Google Cloud Storage 访问 csv 文件?
【发布时间】:2021-06-16 05:40:20
【问题描述】:

我是云函数的新手,所以我遵循了默认的 GCP 云函数"hello world" tutorial。它工作正常并按预期打印“hello world”。我只更改了 requirements.txt 文件以包含 pandas 和 google-cloud-storage。同样,我对 main.py 脚本的所有编辑都在函数定义之前的导入部分和函数的 else 部分中。

requirements.txt

pandas 
google-cloud-storage

main.py:

import pandas as pd
from google.cloud import storage   

def hello_world(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    request_json = request.get_json()
    if request.args and 'message' in request.args:
        return request.args.get('message')
    elif request_json and 'message' in request_json:
        return request_json['message']
    else:       
        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        blob.download_to_filename('temp.csv')        
        with open('temp.csv','rb') as f:
            df = pd.read_csv(f)
        
        return str(df.columns)

当我在 GCP 的“测试云功能”区域测试功能时,在日志中捕获了以下错误。前 7 行似乎是样板错误,而后两行是特定于我的实际程序的。 File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'。我不知道为什么会触发此错误。

错误:

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 87, in view_func return function(request._get_current_object()) 
File "/workspace/main.py", line 25, in hello_world blob.download_to_filename('temp.csv') 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'

对于上下文,我已经将凭据添加到相应的服务帐户,此云功能根据我设置的配置使用该服务帐户。所以,除了授权,我不知道为什么这个功能会失败。 我应该改变什么?

对于上下文,我只是尝试从 Pandas 的云存储中打开任意 csv 文件,并将列的名称作为字符串返回。这没有实际价值,只是在构建有价值的东西之前进行功能测试。

Edit1:赋予与相关云功能对应的服务帐户的特定 IAM 角色是“角色/编辑器”,据我所知,这应该足够了。

Edit2:GCP 云功能似乎在read only environment 中运行。所以必须有其他方法打开文件,而不使用blob.download_to_filename 命令。

【问题讨论】:

  • 你能试试 blob.download_to_filename('/tmp/temp.csv') 吗?然后用那个名字?

标签: python pandas google-cloud-platform google-cloud-functions google-cloud-storage


【解决方案1】:

您是 Cloud Functions 的新手,需要了解一些知识并避免一些陷阱。其中之一:Cloud Functions 是无状态的,不能在文件系统上写。

除了/tmp 目录上。它是一个内存文件系统(正确调整您的 Cloud Functions 内存大小以考虑您的应用程序内存占用 + 存储在 /tmp 目录中的文件大小)

像这样更新你的云函数

....
    else:       
        storage_client = storage.Client()
        bucket = storage_client.bucket('my_bucket')
        model_filename = "my_file.csv"
        blob = bucket.blob(model_filename)
        blob.download_to_filename('/tmp/temp.csv')        
        with open('/tmp/temp.csv','rb') as f:
            df = pd.read_csv(f)
        
        return str(df.columns)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-02-14
    • 1970-01-01
    • 2021-04-08
    • 2021-04-25
    • 1970-01-01
    • 2020-05-05
    • 2012-08-21
    相关资源
    最近更新 更多