【问题标题】:Mesos Marathon apps with persistent volume apps stuck at suspended具有持久卷应用的 Mesos Marathon 应用卡在暂停状态
【发布时间】:2016-05-24 11:55:17
【问题描述】:

我在使用持久本地卷在 Marathon 中运行应用程序时遇到问题。遵循instructions,以角色和主体启动 Marathon,并创建一个具有持久卷的简单应用程序,它只是挂起挂起。似乎从站响应了有效的提议,但实际上无法启动应用程序。即使我使用调试选项编译并使用GLOG_v=2 直接打开日志记录,从站也不会记录任何有关该任务的内容。

此外,Marathon 似乎一直在滚动任务 ID,因为它无法启动,但我无法在任何地方看到原因。

奇怪的是,当我在没有持久卷的情况下运行时,应用程序开始运行。

Marathon 上的调试日志似乎没有显示任何有用的信息,但是我可能遗漏了一些东西。谁能给我任何关于问题可能是什么或在哪里寻找额外调试的指示?提前谢谢了 ???? .

以下是关于我的环境和调试信息的一些信息:

Slave:Ubuntu 14.04 运行 0.28 预构建并在 0.29 中测试,从源代码构建

Master:Mesos 0.28 在 CoreOS 上的 Docker Ubuntu 14.04 映像中运行

Marathon:1.1.1 在 CoreOS 上的 Docker Ubuntu 14.04 映像中运行

具有持久存储的应用

来自v2/apps/test/tasks Marathon 的应用信息

{
  "app": {
    "id": "/test",
    "cmd": "while true; do sleep 10; done",
    "args": null,
    "user": null,
    "env": {},
    "instances": 1,
    "cpus": 1,
    "mem": 128,
    "disk": 0,
    "executor": "",
    "constraints": [
      [
        "role",
        "CLUSTER",
        "persistent"
      ]
    ],
    "uris": [],
    "fetch": [],
    "storeUrls": [],
    "ports": [
      10002
    ],
    "portDefinitions": [
      {
        "port": 10002,
        "protocol": "tcp",
        "labels": {}
      }
    ],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": {
      "type": "MESOS",
      "volumes": [
        {
          "containerPath": "test",
          "mode": "RW",
          "persistent": {
            "size": 100
          }
        }
      ]
    },
    "healthChecks": [],
    "readinessChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
      "minimumHealthCapacity": 0.5,
      "maximumOverCapacity": 0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "ipAddress": null,
    "version": "2016-05-19T11:31:54.861Z",
    "residency": {
      "relaunchEscalationTimeoutSeconds": 3600,
      "taskLostBehavior": "WAIT_FOREVER"
    },
    "versionInfo": {
      "lastScalingAt": "2016-05-19T11:31:54.861Z",
      "lastConfigChangeAt": "2016-05-18T16:46:59.684Z"
    },
    "tasksStaged": 0,
    "tasksRunning": 0,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [
      {
        "id": "4f3779e5-a805-4b95-9065-f3cf9c90c8fe"
      }
    ],
    "tasks": [
      {
        "id": "test.4b7d4303-1dc2-11e6-a179-a2bd870b1e9c",
        "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17",
        "host": "ip-10-0-90-61.eu-west-1.compute.internal",
        "localVolumes": [
          {
            "containerPath": "test",
            "persistenceId": "test#test#4b7d4302-1dc2-11e6-a179-a2bd870b1e9c"
          }
        ],
        "appId": "/test"
      }
    ]
  }
}

Marathon 中的应用信息:(似乎部署在旋转)


没有持久存储的应用

来自v2/apps/test2/tasks Marathon 的应用信息

{
  "app": {
    "id": "/test2",
    "cmd": "while true; do sleep 10; done",
    "args": null,
    "user": null,
    "env": {},
    "instances": 1,
    "cpus": 1,
    "mem": 128,
    "disk": 100,
    "executor": "",
    "constraints": [
      [
        "role",
        "CLUSTER",
        "persistent"
      ]
    ],
    "uris": [],
    "fetch": [],
    "storeUrls": [],
    "ports": [
      10002
    ],
    "portDefinitions": [
      {
        "port": 10002,
        "protocol": "tcp",
        "labels": {}
      }
    ],
    "requirePorts": false,
    "backoffSeconds": 1,
    "backoffFactor": 1.15,
    "maxLaunchDelaySeconds": 3600,
    "container": null,
    "healthChecks": [],
    "readinessChecks": [],
    "dependencies": [],
    "upgradeStrategy": {
      "minimumHealthCapacity": 0.5,
      "maximumOverCapacity": 0
    },
    "labels": {},
    "acceptedResourceRoles": null,
    "ipAddress": null,
    "version": "2016-05-19T13:44:01.831Z",
    "residency": null,
    "versionInfo": {
      "lastScalingAt": "2016-05-19T13:44:01.831Z",
      "lastConfigChangeAt": "2016-05-19T13:09:20.106Z"
    },
    "tasksStaged": 0,
    "tasksRunning": 1,
    "tasksHealthy": 0,
    "tasksUnhealthy": 0,
    "deployments": [],
    "tasks": [
      {
        "id": "test2.bee624f1-1dc7-11e6-b98e-568f3f9dead8",
        "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S18",
        "host": "ip-10-0-90-61.eu-west-1.compute.internal",
        "startedAt": "2016-05-19T13:44:02.190Z",
        "stagedAt": "2016-05-19T13:44:02.023Z",
        "ports": [
          31926
        ],
        "version": "2016-05-19T13:44:01.831Z",
        "ipAddresses": [
          {
            "ipAddress": "10.0.90.61",
            "protocol": "IPv4"
          }
        ],
        "appId": "/test2"
      }
    ],
    "lastTaskFailure": {
      "appId": "/test2",
      "host": "ip-10-0-90-61.eu-west-1.compute.internal",
      "message": "Slave ip-10-0-90-61.eu-west-1.compute.internal removed: health check timed out",
      "state": "TASK_LOST",
      "taskId": "test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c",
      "timestamp": "2016-05-19T13:15:24.155Z",
      "version": "2016-05-19T13:09:20.106Z",
      "slaveId": "9f7c6ed5-4bf5-475d-9311-05d21628604e-S17"
    }
  }
}

运行应用时的从属日志:

I0519 13:09:22.471876 12459 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.471906 12459 status_update_manager.cpp:497] Creating StatusUpdate stream for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.472262 12459 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.477686 12459 status_update_manager.cpp:374] Forwarding update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to the agent
I0519 13:09:22.477830 12453 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.477814016+00:00
I0519 13:09:22.477967 12453 slave.cpp:3638] Forwarding the update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to master@10.0.82.230:5050
I0519 13:09:22.478185 12453 slave.cpp:3532] Status update manager successfully handled status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.478229 12453 slave.cpp:3548] Sending acknowledgement for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000 to executor(1)@10.0.90.61:34262
I0519 13:09:22.488315 12460 pid.cpp:95] Attempting to parse 'master@10.0.82.230:5050' into a PID
I0519 13:09:22.488370 12460 process.cpp:646] Parsed message name 'mesos.internal.StatusUpdateAcknowledgementMessage' for slave(1)@10.0.90.61:5051 from master@10.0.82.230:5050
I0519 13:09:22.488452 12452 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.488441856+00:00
I0519 13:09:22.488600 12458 process.cpp:2605] Resuming (14)@10.0.90.61:5051 at 2016-05-19 13:09:22.488590080+00:00
I0519 13:09:22.488632 12458 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.488726 12458 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000
I0519 13:09:22.492985 12452 process.cpp:2605] Resuming slave(1)@10.0.90.61:5051 at 2016-05-19 13:09:22.492974080+00:00
I0519 13:09:22.493021 12452 slave.cpp:2629] Status update manager successfully handled status update acknowledgement (UUID: 36c1f0cb-2fcd-44b9-ab79-cef81c2094be) for task test2.e74fb439-1dc2-11e6-a179-a2bd870b1e9c of framework 1a6352a6-d690-41a2-967e-07342bba56d2-0000

【问题讨论】:

  • 你能发布马拉松的日志吗?尤其是接受报价的部分。

标签: mesos marathon


【解决方案1】:

可能是由于磁盘空间或 RAM 不足。 最低空闲配置在下面link中指定

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-01-30
    • 1970-01-01
    • 1970-01-01
    • 2016-07-20
    • 1970-01-01
    相关资源
    最近更新 更多