【发布时间】:2020-10-23 22:45:19
【问题描述】:
我正在编写 this 教程并尝试在 Google Cloud Engine 上部署深度学习模型。 我能够成功地将用烧瓶框架包装的模型容器化。但是,当我想将容器与 Kubernetes 连接时,出现错误。
$ kubectl run keras-app --image=stamatelou/keras-app --port 5000
pod/keras-app created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
keras-app 0/1 ContainerCreating 0 20s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
keras-app 1/1 Running 0 98s
这里似乎应用程序已按预期创建和运行,但是当我运行以下命令时,我收到错误消息。
$ kubectl expose deployment keras-app --type=LoadBalancer --port 80 --target-port 5000
Error from server (NotFound): deployments.extensions "keras-app" not found
这里是容器“keras-app”的日志
$ kubectl logs keras-app
2020-07-03 06:56:10.730502: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file:N
o such file or directory
2020-07-03 06:56:10.730899: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
2020-07-03 06:56:10.731013: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (keras-app): /proc/driver/nvidia/version does note
xist
2020-07-03 06:56:10.731416: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-03 06:56:10.740235: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2300000000 Hz
2020-07-03 06:56:10.740653: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb760000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-03 06:56:10.740769: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
* Loading Keras model and Flask starting server...please wait until server has fully started
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5
102973440/102967424 [==============================] - 1s 0us/step
* Serving Flask app "app" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
【问题讨论】:
标签: docker machine-learning kubernetes containers google-kubernetes-engine