如何在 SageMaker 中的 model.tar.gz 上创建模型？答案

【问题标题】：How to create a model on model.tar.gz in SageMaker?如何在 SageMaker 中的 model.tar.gz 上创建模型？
【发布时间】：2021-05-14 23:11:09
【问题描述】：

我想在我的模型工件 (s3:/bucket/output/model.tar.gz) 上创建一个模型，用于海滩变换和部署？我的模型是一个简单的随机森林，我使用 Python SDK 和训练脚本对其进行了训练。在我的火车脚本中，我只有 model_fn 函数和 main 函数。

现在我想为批量转换作业创建一个模型：

from sagemaker.image_uris import retrieve
image = retrieve(region= sagemaker.Session().boto_session.region_name, framework='sklearn', version='0.23-1' )
from sagemaker.model import Model
estimator =  model.deploy(initial_instance_count = 1 , instance_type = 'ml.p2.xlarge')

我收到了这个错误

Error hosting endpoint sagemaker-scikit-learn-2021-05-14-19-43-21-320: Failed. Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint...

另外，我尝试进行转换工作，但我的工作一直在运行，并出现此错误

transformer = model.transformer(instance_count=1, instance_type="ml.m5.xlarge")
transformer.transform('address to s3 input')

错误：

Traceback (most recent call last):
File "/miniconda3/lib/python3.7/site-packages/gunicorn/workers/base_async.py", line 55, in handle
self.handle_request(listener_name, req, client, addr)
File "/miniconda3/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 143, in handle_request
super().handle_request(listener_name, req, sock, addr)
File "/miniconda3/lib/python3.7/site-packages/gunicorn/workers/base_async.py", line 106, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 128, in main
serving_env.module_dir)
File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 105, in import_module
user_module = importlib.import_module(module_name)
File "/miniconda3/lib/python3.7/importlib/__init__.py", line 118, in import_module
if name.startswith('.'):

AttributeError: 'NoneType' object has no attribute 'startswith'

我应该使用 Sagemaker.model.SKLearnModel 吗？那么，它们之间有什么区别呢？
如果我想使用 SKLearnModel，那么我需要 inference.py，它会是什么样子？任何样品将不胜感激，
如果我在训练后立即部署并创建转换作业，为什么不需要它，是因为 model_fn 在我的训练脚本中吗？
我的训练脚本中没有 input_fn 和 output_fn 和 predict_fn 是问题的根源吗？

【问题讨论】：

标签： python amazon-web-services scikit-learn sdk amazon-sagemaker

【解决方案1】：

如果您已经将 S3 中的模型工件存储为 model.tar.gz。

你应该-

创建一个 SKLearnModel
将您的 inference.py 文件（应该包含 model_fn、predict_fn、input_fn）打包到名为 source.tar.gz 的文件中
将环境变量SAGEMAKER_SUBMIT_DIRECTORY指定为SKLearnModel对象中source.tar.gz的路径。
将模型部署到端点
创建批量转换作业。

docs 中的示例

【讨论】：