【问题标题】:tf-serving abnormal exit without error messagetf-serving 异常退出没有错误信息
【发布时间】:2019-01-10 00:52:08
【问题描述】:

tf-serving 异常退出没有错误信息

系统信息

操作系统平台和发行版(例如,Linux Ubuntu 16.04):ReaHat EL6

TensorFlow Serving 安装自(源代码或二进制文件):使用 bazel 0.18.0 的源代码

TensorFlow 服务版本:1.12.0

描述问题

我在 RHEL 6.9 中使用 bazel 编译 tf-serving,并使用以下命令启动它:

./model_servers/tensorflow_model_server --model_config_file=./data/models.conf --rest_api_port=8502

models.conf:

model_config_list: {
  config: {
    name: "model_1",
base_path:"/search/work/tf_serving_bin/tensorflow_serving/data/model_data/model_1",
    model_platform: "tensorflow",
    model_version_policy: {
      latest: {
        num_versions: 1
      }
    }
  }
}

客户端使用C++,使用libCurl请求tf-serving REST api,但是,tf-serving经常在几分钟内不正常退出而没有错误信息。

当我的客户端服务请求 localhost tf-serving 时,问题经常出现。但是,客户端服务请求 tf-serving 在其他机器上,问题不会出现,qps

我检查内存,cpu空闲等...没有发现问题。所以,很奇怪。

export export TF_CPP_MIN_VLOG_LEVEL=1,也没有错误/关键信息。

源代码/日志

2019-01-09 09:28:35.118183: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-01-09 09:28:35.118259: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: app_ks_nfm_1
2019-01-09 09:28:35.227383: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227424: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227443: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:35.227492: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.227530: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359
2019-01-09 09:28:35.256712: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-01-09 09:28:35.267728: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 09:28:35.313087: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2019-01-09 09:28:38.797633: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key legacy_init_op on SavedModel bundle.
2019-01-09 09:28:38.803984: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 3570131 microseconds.
2019-01-09 09:28:38.804027: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /search/work/bazel-bin-serving/tensorflow_serving/data/model_data/app_ks_nfm_1/201901072359/assets.extra/tf_serving_warmup_requests
2019-01-09 09:28:38.804148: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: app_ks_nfm_1 version: 201901072359}
2019-01-09 09:28:38.831860: I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-01-09 09:28:38.865243: I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8502 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...

【问题讨论】:

  • 我也有类似的问题。你想清楚了吗?

标签: tensorflow-serving


【解决方案1】:

不是异常退出。这表明服务器已准备好接收推理请求。

为了澄清,请找到以下解释:

docker run --runtime=nvidia -p 8501:8501 \
  --mount type=bind,\ source=/tmp/tfserving/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,\
  target=/models/half_plus_two \
  -e MODEL_NAME=half_plus_two -t tensorflow/serving:latest-gpu &

这将使用 nvidia-docker 运行时运行 docker 容器,启动 TensorFlow Serving Model Server,绑定 REST API 端口 8501,并将我们所需的模型从我们的主机映射到容器中预期模型的位置。我们还将模型的名称作为环境变量传递,这在我们查询模型时很重要。

提示:在查询模型之前,一定要等到看到如下消息,表明服务器已准备好接收请求:

2018-07-27 00:07:20.773693: I tensorflow_serving/model_servers/main.cc:333]
Exporting HTTP/REST API at:localhost:8501 ...

在该消息之后,只需按 Enter 即可使用以下命令查询模型

curl -d '{"instances": [1.0, 2.0, 5.0]}' \
  -X POST http://localhost:8501/v1/models/half_plus_two:predict

更多信息,请参考以下链接:

https://www.tensorflow.org/tfx/serving/docker#gpu_serving_example

【讨论】:

    【解决方案2】:

    原因:短连接产生大量TCP状态'TIME_WAIT',可用的linux系统文件句柄被占用。

    【讨论】:

      猜你喜欢
      • 2022-10-05
      • 1970-01-01
      • 2021-10-25
      • 2014-03-15
      • 2014-04-05
      • 1970-01-01
      • 2013-05-23
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多