【发布时间】:2021-06-27 10:59:55
【问题描述】:
我想将 PyTorch 模型部署到 AWS SageMaker 端点并遇到一些问题。下面是代码和日志。基本上,我有在 SageMaker (Azure) 之外训练的模型的 .pth 文件,我想将它移动到 AWS。 macro_model.tar.gz 包含模型权重macro_model.pth 文件和带有requirements.txt 的推理代码。
预期行为
模型部署到端点并生成预测
问题
SageMaker model_fn 函数看不到模型权重。
有什么想法可能是错的吗?
SageMaker Notebook 中的部署代码:
import boto3
import sagemaker
from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
import json
import numpy as np
role = get_execution_role()
conn = boto3.client('s3')
client = boto3.client('sagemaker')
pytorch_model = PyTorchModel(model_data='s3://.../macro_model.tar.gz',
framework_version="1.7", py_version="py3",
role=role, entry_point='inference.py', source_dir='code')
predictor = pytorch_model.deploy(initial_instance_count=1,instance_type='ml.p2.xlarge', endpoint_name='...')
macro_model.tar.gz结构:
| macro_model
| |--macro_model.pth
|
| code
| |--inference.py
| |--requirements.txt
|
model_fn函数实现:
def model_fn(model_dir):
logging.info('Loading the model...')
layers = [
nn.Linear(512, 512),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(512, 2)
]
logging.info('Layers initiated...')
model = VideoRecog_Model1(layers,7)
logging.info('Model initiated...')
with open(os.path.join(model_dir, 'macro_model.pth'), 'rb') as f:
model.load_state_dict(torch.load(f))
model.to(device).eval()
logging.info('Done loading model')
return model
日志:
2021-06-27T12:24:56.241+02:00 Collecting opencv-python-headless Downloading opencv_python_headless-4.5.2.54-cp36-cp36m-manylinux2014_x86_64.whl (38.2 MB)
2021-06-27T12:24:57.242+02:00 Collecting moviepy==1.0.3 Downloading moviepy-1.0.3.tar.gz (388 kB)
2021-06-27T12:24:57.242+02:00 Collecting av==8.0.3 Downloading av-8.0.3-cp36-cp36m-manylinux2010_x86_64.whl (37.2 MB)
2021-06-27T12:24:58.243+02:00 Collecting decorator<5.0,>=4.0.2 Downloading decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
2021-06-27T12:24:58.243+02:00 Requirement already satisfied: tqdm<5.0,>=4.11.2 in /opt/conda/lib/python3.6/site-packages (from moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (4.59.0)
2021-06-27T12:24:58.243+02:00 Requirement already satisfied: requests<3.0,>=2.8.1 in /opt/conda/lib/python3.6/site-packages (from moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (2.22.0)
2021-06-27T12:24:58.243+02:00 Collecting proglog<=1.0.0 Downloading proglog-0.1.9.tar.gz (10 kB)
2021-06-27T12:24:59.244+02:00 Requirement already satisfied: numpy>=1.17.3 in /opt/conda/lib/python3.6/site-packages (from moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (1.19.1)
2021-06-27T12:24:59.244+02:00 Collecting imageio<3.0,>=2.5 Downloading imageio-2.9.0-py3-none-any.whl (3.3 MB)
2021-06-27T12:24:59.244+02:00 Collecting imageio_ffmpeg>=0.2.0 Downloading imageio_ffmpeg-0.4.4-py3-none-manylinux2010_x86_64.whl (26.9 MB)
2021-06-27T12:25:00.244+02:00 Requirement already satisfied: pillow in /opt/conda/lib/python3.6/site-packages (from imageio<3.0,>=2.5->moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (8.2.0)
2021-06-27T12:25:00.244+02:00 Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (2020.12.5)
2021-06-27T12:25:00.244+02:00 Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (2.8)
2021-06-27T12:25:00.244+02:00 Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (1.25.11)
2021-06-27T12:25:00.244+02:00 Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests<3.0,>=2.8.1->moviepy==1.0.3->-r /opt/ml/model/code/requirements.txt (line 2)) (3.0.4)
2021-06-27T12:25:00.244+02:00 Building wheels for collected packages: moviepy, proglog Building wheel for moviepy (setup.py): started Building wheel for moviepy (setup.py): finished with status 'done' Created wheel for moviepy: filename=moviepy-1.0.3-py3-none-any.whl size=110726 sha256=d90e44b117edbb3d061d7d3d800e6e687392ac6b06f9001f82559c4dd88f19c4 Stored in directory: /root/.cache/pip/wheels/be/dc/17/8b4d5a63bcd05dc44db7da57e193372ccd333617293f9deebe Building wheel for proglog (setup.py): started
2021-06-27T12:25:01.245+02:00 Building wheel for proglog (setup.py): finished with status 'done' Created wheel for proglog: filename=proglog-0.1.9-py3-none-any.whl size=6147 sha256=da1b510090c3b87cf4c564558a64b996bc1fdf34d6d15fd56c14c9c776f5b366 Stored in directory: /root/.cache/pip/wheels/e7/11/a0/7e65f734d33043735a557b1244569cca327353db9068158076
2021-06-27T12:25:01.245+02:00 Successfully built moviepy proglog
2021-06-27T12:25:01.245+02:00 Installing collected packages: proglog, imageio-ffmpeg, imageio, decorator, opencv-python-headless, moviepy, av
2021-06-27T12:25:02.246+02:00 Attempting uninstall: decorator Found existing installation: decorator 5.0.9 Uninstalling decorator-5.0.9:
2021-06-27T12:25:03.246+02:00 Successfully uninstalled decorator-5.0.9
2021-06-27T12:25:05.252+02:00 Successfully installed av-8.0.3 decorator-4.4.2 imageio-2.9.0 imageio-ffmpeg-0.4.4 moviepy-1.0.3 opencv-python-headless-4.5.2.54 proglog-0.1.9
2021-06-27T12:25:05.252+02:00 WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
2021-06-27T12:25:07.256+02:00 2021-06-27 10:25:06,537 [INFO ] main org.pytorch.serve.ModelServer -
2021-06-27T12:25:07.256+02:00 Torchserve version: 0.3.1
2021-06-27T12:25:07.256+02:00 TS Home: /opt/conda/lib/python3.6/site-packages
2021-06-27T12:25:07.256+02:00 Current directory: /
2021-06-27T12:25:07.256+02:00 Temp directory: /home/model-server/tmp
2021-06-27T12:25:07.257+02:00 Number of GPUs: 1
2021-06-27T12:25:07.257+02:00 Number of CPUs: 1
2021-06-27T12:25:07.257+02:00 Max heap size: 14097 M
2021-06-27T12:25:07.257+02:00 Python executable: /opt/conda/bin/python3.6
2021-06-27T12:25:07.257+02:00 Config file: /etc/sagemaker-ts.properties
2021-06-27T12:25:07.257+02:00 Inference address: http://0.0.0.0:8080
2021-06-27T12:25:07.257+02:00 Management address: http://0.0.0.0:8080
2021-06-27T12:25:07.257+02:00 Metrics address: http://127.0.0.1:8082
2021-06-27T12:25:07.257+02:00 Model Store: /.sagemaker/ts/models
2021-06-27T12:25:07.257+02:00 Initial Models: model.mar
2021-06-27T12:25:07.257+02:00 Log dir: /logs
2021-06-27T12:25:07.257+02:00 Metrics dir: /logs
2021-06-27T12:25:07.257+02:00 Netty threads: 0
2021-06-27T12:25:07.257+02:00 Netty client threads: 0
2021-06-27T12:25:07.257+02:00 Default workers per model: 1
2021-06-27T12:25:07.257+02:00 Blacklist Regex: N/A
2021-06-27T12:25:07.257+02:00 Maximum Response Size: 6553500
2021-06-27T12:25:07.258+02:00 Maximum Request Size: 6553500
2021-06-27T12:25:07.258+02:00 Prefer direct buffer: false
2021-06-27T12:25:07.258+02:00 Allowed Urls: [file://.*|http(s)?://.*]
2021-06-27T12:25:07.258+02:00 Custom python dependency for model allowed: false
2021-06-27T12:25:07.258+02:00 Metrics report format: prometheus
2021-06-27T12:25:07.258+02:00 Enable metrics API: true
2021-06-27T12:25:07.258+02:00 2021-06-27 10:25:06,597 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: model.mar
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,725 [INFO ] main org.pytorch.serve.archive.ModelArchive - eTag a6e950a7055442d88ff2f182fdef1da3
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,744 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,782 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,935 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,935 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,937 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,996 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.ts.sock.9000
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,999 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - [PID]59
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,999 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:08,999 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.13
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:09,013 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2021-06-27T12:25:09.259+02:00 2021-06-27 10:25:09,039 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2021-06-27T12:25:10.260+02:00 Model server started.
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,768 [INFO ] pool-2-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,776 [INFO ] pool-2-thread-1 TS_METRICS - DiskAvailable.Gigabytes:11.663764953613281|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,777 [INFO ] pool-2-thread-1 TS_METRICS - DiskUsage.Gigabytes:12.690078735351562|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,777 [INFO ] pool-2-thread-1 TS_METRICS - DiskUtilization.Percent:52.1|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,778 [INFO ] pool-2-thread-1 TS_METRICS - MemoryAvailable.Megabytes:59581.45703125|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,787 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUsed.Megabytes:1238.21875|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:10.260+02:00 2021-06-27 10:25:09,788 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUtilization.Percent:3.0|#Level:Host|#hostname:model.aws.local,timestamp:1624789509
2021-06-27T12:25:12.261+02:00 2021-06-27 10:25:11,779 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Generating new fontManager, this may take some time...
2021-06-27T12:25:13.262+02:00 2021-06-27 10:25:12,875 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Loading the model...
2021-06-27T12:25:13.262+02:00 2021-06-27 10:25:12,906 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Layers initiated...
2021-06-27T12:25:14.262+02:00 2021-06-27 10:25:13,871 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Downloading: "https://download.pytorch.org/models/r3d_18-b3b3357e.pth" to /root/.cache/torch/hub/checkpoints/r3d_18-b3b3357e.pth
2021-06-27T12:25:14.262+02:00 2021-06-27 10:25:13,872 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle -
2021-06-27T12:25:14.262+02:00 2021-06-27 10:25:13,972 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 0%| | 0.00/127M [00:00<?, ?B/s]
2021-06-27T12:25:14.262+02:00 2021-06-27 10:25:14,072 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 6%|▌ | 7.30M/127M [00:00<00:01, 76.5MB/s]
2021-06-27T12:25:14.262+02:00 2021-06-27 10:25:14,172 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 16%|█▋ | 20.7M/127M [00:00<00:00, 114MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,272 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 27%|██▋ | 34.0M/127M [00:00<00:00, 126MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,372 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 38%|███▊ | 48.1M/127M [00:00<00:00, 134MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,472 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 49%|████▉ | 62.1M/127M [00:00<00:00, 139MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,572 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 60%|██████ | 76.8M/127M [00:00<00:00, 144MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,692 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 72%|███████▏ | 91.9M/127M [00:00<00:00, 149MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,794 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 83%|████████▎ | 106M/127M [00:00<00:00, 140MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,852 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 94%|█████████▍| 120M/127M [00:00<00:00, 139MB/s]
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,987 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Model initiated...
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,988 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2021-06-27 10:25:14,988 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process died.
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,988 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,988 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 182, in <module>
2021-06-27 10:25:14,988 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 182, in <module>
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - worker.run_server()
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 154, in run_server
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 154, in run_server
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self.handle_connection(cl_socket)
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 116, in handle_connection
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 116, in handle_connection
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service, result, code = self.load_model(msg)
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 89, in load_model
2021-06-27 10:25:14,989 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py", line 89, in load_model
2021-06-27T12:25:15.263+02:00
2021-06-27 10:25:14,990 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
2021-06-27 10:25:14,990 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,990 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/ts/model_loader.py", line 104, in load
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,990 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - initialize_fn(service.context)
2021-06-27T12:25:15.263+02:00 2021-06-27 10:25:14,990 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/home/model-server/tmp/models/a6e950a7055442d88ff2f182fdef1da3/handler_service.py", line 51, in initialize
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - super().initialize(context)
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self._service.validate_and_initialize(model_dir=model_dir)
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 158, in validate_and_initialize
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - self._model = self._model_fn(model_dir)
2021-06-27T12:25:15.264+02:00
2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 105, in model_fn
2021-06-27 10:25:14,991 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 105, in model_fn
2021-06-27T12:25:15.264+02:00
2021-06-27 10:25:14,992 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - with open(os.path.join(model_dir, 'macro_model.pth'), 'rb') as f:
2021-06-27 10:25:14,992 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - with open(os.path.join(model_dir, 'macro_model.pth'), 'rb') as f:
2021-06-27T12:25:15.264+02:00
**2021-06-27 10:25:14,992 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - FileNotFoundError: [Errno 2] No such file or directory: '/home/model-server/tmp/models/a6e950a7055442d88ff2f182fdef1da3/macro_model.pth'**
**2021-06-27 10:25:14,992 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - FileNotFoundError: [Errno 2] No such file or directory: '/home/model-server/tmp/models/a6e950a7055442d88ff2f182fdef1da3/macro_model.pth'**
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,994 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 100%|██████████| 127M/127M [00:00<00:00, 136MB/s]
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,998 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2021-06-27T12:25:15.264+02:00 2021-06-27 10:25:14,999 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
【问题讨论】:
-
需要检查这个方法
PyTorchModel
标签: pytorch endpoint amazon-sagemaker