【问题标题】:s3fs timeout issue on an AWS Lambda function within a VPNVPN 中 AWS Lambda 函数的 s3fs 超时问题
【发布时间】:2020-08-07 08:02:28
【问题描述】:

s3fs 在 VPN 中使用 AWS Lambda 函数从 S3 存储桶读取数据时似乎不时失败。我正在使用 s3fs==0.4.0pandas==1.0.1

import s3fs
import pandas as pd


def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    s3_file = event['Records'][0]['s3']['object']['key']
    s3fs.S3FileSystem.connect_timeout = 1800
    s3fs.S3FileSystem.read_timeout = 1800
    with s3fs.S3FileSystem(anon=False).open(f"s3://{bucket}/{s3_file}", 'rb') as f:
        self.data = pd.read_json(f, **kwargs)

堆栈跟踪如下:

Traceback (most recent call last):
File "/var/task/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/var/task/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/var/task/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/var/task/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/var/task/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/urllib3/packages/six.py", line 735, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/var/task/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/var/task/urllib3/connectionpool.py", line 994, in _validate_conn
conn.connect()
File "/var/task/urllib3/connection.py", line 300, in connect
conn = self._new_conn()
File "/var/task/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f4d578e3ed0>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/var/task/botocore/endpoint.py", line 244, in _send
return self.http_session.send(request)
File "/var/task/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://my_bucket.s3.eu-west-1.amazonaws.com/?list-type=2&prefix=my_folder%2Fsomething%2F&delimiter=%2F&encoding-type=url"

有人遇到过同样的问题吗?为什么它只是有时会失败?是否有 s3fs 配置可以帮助解决这个特定问题?

【问题讨论】:

  • 旁注:不建议在生产环境中使用s3fs。 Amazon S3 是对象存储系统,而不是文件系统。坦率地说,我从未听说过有人从 AWS Lambda 函数中尝试 s3fs。我建议您改用 AWS 开发工具包与 Amazon S3 进行通信。

标签: amazon-s3 aws-lambda s3fs python-s3fs


【解决方案1】:

您也可以使用 AWS 支持的boto3,以便从 S3 获取 json。

import json
import boto3

def lambda_handler(event, context):
   bucket = event['Records'][0]['s3']['bucket']['name']
   key = event['Records'][0]['s3']['object']['key']

   s3 = boto3.resource('s3')
   file_object = s3_resource.Object(bucket, key)
   json_content = json.loads(file_object.get()['Body'].read())

【讨论】:

    【解决方案2】:

    其实s3fs完全没有问题。好像我们在 VPC 中使用带有两个子网的 Lambda 函数,一个正常工作,但另一个不允许访问 S3 资源,因此当使用第二个网络生成 Lambda 时,它将无法连接一点也不。

    解决这个问题就像删除第二个子网一样简单。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-07-22
      • 1970-01-01
      • 1970-01-01
      • 2019-06-17
      • 2021-10-17
      • 2019-05-01
      • 1970-01-01
      相关资源
      最近更新 更多