【问题标题】:Core dumped in running Mesos cluster on docker在 docker 上运行 Mesos 集群时核心转储
【发布时间】:2019-01-23 12:25:41
【问题描述】:

我有一个名为 ubuntu_mesos_spark 的 docker 镜像。我在上面安装了zookeeper。我像这样更改“zoo.cfg”文件: 这是 node1(150.20.11.157) 中的“zoo.cfg”

tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=0.0.0.0:2888:3888
server.2=150.20.11.157:2888:3888
server.3=150.20.11.137:2888:3888

这是 node1(150.20.11.134) 中的“zoo.cfg”

tickTime=2000
initLimit=10
syncLimit=5
clientPort=2187
dataDir=/var/lib/zookeeper
server.1=150.20.11.157:2888:3888
server.2=0.0.0.0:2888:3888
server.3=150.20.11.137:2888:3888

这是 node1(150.20.11.137) 中的“zoo.cfg”

 tickTime=2000
 initLimit=10
 syncLimit=5
 clientPort=2187
 dataDir=/var/lib/zookeeper
 server.1=150.20.11.157:2888:3888
 server.2=150.20.11.134:2888:3888
 server.3=0.0.0.0:2888:3888

我还在每个节点的“/var/lib/zookeeper”中创建了一个“myid”文件。例如对于“150.20.11.157”,其 ID 在 myid 文件中为“1”。 我也在 docker 上安装了 Mesos 和 Spark。我也有这三个节点的 Mesos 集群。我在这个文件中定义了从节点的 IP 地址:“spark/conf/slaves”

150.20.11.134
150.20.11.137

我在“spark/conf/spark-env.sh”中添加了这些行:

export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=/home/spark/program_file/spark-2.3.2-bin- 
hadoop2.7.tgz

此外,我在“~/.bashrc”文件中添加了这些行:

export SPARK_HOME="/home/spark"
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7- 
src.zip:$PYTHO$
export PYSPARK_HOME=/usr/bin/python3.6
export PYSPARK_DRIVER_PYTHON=python3.6
export ZOO_LOG_DIR=/var/log/zookeeper

我想在“150.20.11.157”中运行主代码。我的docker-compose是:

 version: '3.7'
 services:
  zookeeper:
  image: ubuntu_mesos_spark
  command: /zookeeper-3.4.12/bin/zkServer.sh start
  environment:
   ZOOKEEPER_SERVER_ID: 1
   ZOOKEEPER_CLIENT_PORT: 2187
   ZOOKEEPER_TICK_TIME: 2000
   ZOOKEEPER_INIT_LIMIT: 10
   ZOOKEEPER_SYNC_LIMIT: 5
   ZOOKEEPER_SERVERS: 
   0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
 network_mode: host
 expose:
  - 2187 
  - 2888
  - 3888
 ports:
  - 2187:2187
  - 2888:2888
  - 3888:3888

master:
image: ubuntu_mesos_spark
command: bash -c "sleep 20; /home/mesos-1.7.0/build/bin/mesos- 
master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos"
restart: always
depends_on:
 - zookeeper
environment:
 - MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
 - MESOS_QUORUM=1
 - MESOS_LOG_DIR=/var/log/mesos
expose:
 - 5050
 - 4040
 - 7077
 - 8080
ports:
  - 5050:5050
  - 4040:4040
  - 7077:7077
  - 8080:8080

另外,我在从节点上运行这个 compose 文件:“150.20.11.134,150.20.11.137”:

 version: '3.7'
 services:
  zookeeper:
  image: ubuntu_mesos_spark
  command: /zookeeper-3.4.12/bin/zkServer.sh start
  environment:
   ZOOKEEPER_SERVER_ID: 2
   ZOOKEEPER_CLIENT_PORT: 2187
   ZOOKEEPER_TICK_TIME: 2000
   ZOOKEEPER_INIT_LIMIT: 10
   ZOOKEEPER_SYNC_LIMIT: 5
   ZOOKEEPER_SERVERS: 
   0.0.0.0:2888:3888;150.20.11.134:2888:3888;150.20.11.137:2888:3888
 network_mode: host
 expose:
  - 2187 
  - 2888
  - 3888
 ports:
  - 2187:2187
  - 2888:2888
  - 3888:3888

slave:
image: ubuntu_mesos_spark
command: bash -c "/home/mesos-1.7.0/build/bin/mesos-slave.sh -- 
master=150.20.11.157:5050 --work_dir=/var/run/mesos  
--systemd_enable_support=false"
restart: always
privileged: true
network_mode: host
depends_on:
- zookeeper
environment:
 - MESOS_HOSTNAME="150.20.11.157,150.20.11.134,150.20.11.137"
 - MESOS_MASTER=150.20.11.157
 - MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
 - MESOS_CONTAINERIZERS=docker,mesos
 - MESOS_LOG_DIR=/var/log/mesos
 - MESOS_LOGGING_LEVEL=INFO
expose:
  - 5051
ports:
  - 5051:5051

首先我在主节点上运行“sudo docker-compose up”。然后我在从节点上运行它。但我得到这个错误:

在Master节点上,错误是:

正在启动 marzieh-compose_zookeeper_1 ... 完成

重新创建 marzieh-compose_master_1 ... 完成

附加到 marzieh-compose_zookeeper_1、marzieh-compose_master_1

动物园管理员_1 | ZooKeeper JMX 默认启用

动物园管理员_1 |使用配置:/zookeeper-3.4.12/bin/../conf/zoo.cfg

动物园管理员_1 |启动 zookeeper ... 已启动

marzieh-compose_zookeeper_1 以代码 0 退出

master_1 | I0123 11:46:59.585522 7 logging.cpp:201] INFO 级别日志记录已开始!

master_1 | I0123 11:46:59.586066 7 main.cpp:242] 构建:2019-01-21 05:16:39 by 大师_1 | I0123 11:46:59.586097 7 main.cpp:243] 版本:1.7.0

master_1 | F0123 11:46:59.587368 7 process.cpp:1115] 初始化失败:无法在 150.20.11.157:5050 上绑定:无法分配请求的地址

master_1 | * 检查失败堆栈跟踪:*

master_1 | @ 0x7f505ce54b9c google::LogMessage::Fail()

master_1 | @ 0x7f505ce54ae0 google::LogMessage::SendToLog()

master_1 | @ 0x7f505ce544b2 google::LogMessage::Flush()

master_1 | @ 0x7f505ce57770
google::LogMessageFatal::~LogMessageFatal()

master_1 | @ 0x7f505cd19ed1 进程::initialize()

master_1 | @ 0x55fb7b12981a 主要

master_1 | @ 0x7f504f0d0830(未知)

master_1 | @ 0x55fb7b1288b9 _start

master_1 | bash:第 1 行:7 中止(核心转储)/home/mesos-1.7.0/build/bin/mesos-master.sh --ip=150.20.11.157 --work_dir=/var/run/mesos

此外,当我在从节点上运行“sudo docker-compose up”时。我收到了这个错误:

slave_1 | F0123 11:40:06.878793 1 process.cpp:1115] 初始化失败:无法在 0.0.0.0:5051 上绑定:地址已在使用中

slave_1 | * 检查失败堆栈跟踪:*

slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()

slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()

slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()

slave_1 | @ 0x7fee9d31c770
google::LogMessageFatal::~LogMessageFatal()

slave_1 | @ 0x7fee9d1deed1 进程::initialize()

slave_1 | @ 0x55e99f661784 主要

slave_1 | @ 0x7fee8f595830(未知)

slave_1 | @ 0x55e99f65f139 _start

slave_1 | * 在 1548243606(unix 时间)中止,如果您使用 GNU 日期,请尝试“date -d @1548243606”*

slave_1 | PC:@ 0x7fee8f5ac196(未知)

slave_1 | * PID 1 (TID 0x7fee9f9f38c0) 从 PID 0 接收到的 SIGSEGV (@0x0);堆栈跟踪:*

slave_1 | @ 0x7fee8fee8390(未知)

slave_1 | @ 0x7fee8f5ac196(未知)

slave_1 | @ 0x7fee9d32055b 谷歌::DumpStackTraceAndExit()

slave_1 | @ 0x7fee9d319b9c google::LogMessage::Fail()

slave_1 | @ 0x7fee9d319ae0 google::LogMessage::SendToLog()

slave_1 | @ 0x7fee9d3194b2 google::LogMessage::Flush()

slave_1 | @ 0x7fee9d31c770 google::LogMessageFatal::~LogMessageFatal()

slave_1 | @ 0x7fee9d1deed1 进程::initialize()

slave_1 | @ 0x55e99f661784 主要

slave_1 | @ 0x7fee8f595830(未知)

slave_1 | @ 0x55e99f65f139 _start

slave_1 | I0123 11:41:07.818897 1 logging.cpp:201] INFO 级别日志记录已开始!

slave_1 | I0123 11:41:07.819437 1 main.cpp:349] 构建:2019-01-21 05:16:39 by

slave_1 | I0123 11:41:07.819470 1 main.cpp:350] 版本:1.7.0

slave_1 | I0123 11:41:07.823354 1 resolver.cpp:69] 创建默认秘密解析器

slave_1 | E0123 11:41:07.927773 1 main.cpp:483] 退出状态为 1:无法创建容器:无法创建 DockerContainerizer: 无法创建 docker: 无法获取 docker 版本: 无法执行 'docker -H unix:///var/run/docker.sock -- 版本':以状态 127 退出

我对此进行了很多搜索,但我无法弄清楚。请您指导我编写 docker compose 以在 docker 上运行 Mesos 和 Spark 集群的正确方法是什么?

任何帮助将不胜感激。

提前致谢。

【问题讨论】:

    标签: docker apache-zookeeper mesos


    【解决方案1】:

    问题解决了。我像这样更改了 docker compose,Master 和 Slaves 运行没有问题:

    Master节点中的“docker-compose.yaml”如下:

    version: '3.7'
    services:
    zookeeper:
     image: ubuntu_mesos_spark_python3.6_client
     command: /home/zookeeper-3.4.12/bin/zkServer.sh start
     environment:
      ZOOKEEPER_SERVER_ID: 1
      ZOOKEEPER_CLIENT_PORT: 2188
      ZOOKEEPER_TICK_TIME: 2000
      ZOOKEEPER_INIT_LIMIT: 10
      ZOOKEEPER_SYNC_LIMIT: 5
      ZOOKEEPER_SERVERS: 0.0.0.0:2888:3888;150.20.11.157:2888:3888
     network_mode: host
     expose:
      - 2188
      - 2888
      - 3888
     ports:
      - 2188:2188
      - 2888:2888
      - 3888:3888
    
    master:
    image: ubuntu_mesos_spark_python3.6_client
    command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-master.sh 
    --ip=150.20.10.136 --work_dir=/var/run/mesos --hostname=x.x.x.x"  ##hostname : 
    IP of the master node
    restart: always
    network_mode: host
    depends_on:
     - zookeeper
    environment:
    - MESOS_HOSTNAME="150.20.11.136"
    - MESOS_QUORUM=1
    - MESOS_LOG_DIR=/var/log/mesos
    expose:
     - 5050
     - 4040
     - 7077
     - 8080
    ports:
     - 5050:5050
     - 4040:4040
     - 7077:7077
     - 8080:8080
    

    另外,slave节点中的“docker-compose.yaml”文件是这样的:

     version: '3.7'
     services:
      zookeeper:
       image: ubuntu_mesos_spark_python3.6_client
       command: /home/zookeeper-3.4.12/bin/zkServer.sh start
       environment:
         ZOOKEEPER_SERVER_ID: 2
         ZOOKEEPER_CLIENT_PORT: 2188
         ZOOKEEPER_TICK_TIME: 2000
         ZOOKEEPER_INIT_LIMIT: 10
         ZOOKEEPER_SYNC_LIMIT: 5
         ZOOKEEPER_SERVERS: 150.20.11.136:2888:3888;0.0.0.0:2888:3888
       network_mode: host
       expose:
       - 2188 
       - 2888
       - 3888
       ports:
       - 2188:2188
       - 2888:2888
       - 3888:3888
    
     slave:
     image: ubuntu_mesos_spark_python3.6_client
     command: bash -c "sleep 30; /home/mesos-1.7.0/build/bin/mesos-slave.sh 
     --master=150.20.11.136:5050 --work_dir=/var/run/mesos  
     --systemd_enable_support=false"
     restart: always
     privileged: true
     network_mode: host
     depends_on:
     - zookeeper
     environment:
     - MESOS_HOSTNAME="150.20.11.157"
     #- MESOS_MASTER=172.28.10.136
     #- MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins #also in Dockerfile
     #- MESOS_CONTAINERIZERS=docker,mesos
     - MESOS_LOG_DIR=/var/log/mesos
     - MESOS_LOGGING_LEVEL=INFO
    expose:
     - 5051
    ports:
     - 5051:5051
    

    然后我在每个节点上运行“docker-compose up”,它们运行没有任何问题。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-22
      • 2018-03-09
      相关资源
      最近更新 更多