【发布时间】:2019-07-02 19:01:32
【问题描述】:
由于 java.net.UnknownHostException: kubernetes.default.svc,我向 k8s 集群提交了示例 spark(在 Spark 代码中提供的作业)。如果您能帮我解决这个问题,那将非常有帮助。
我的环境:
- Ubuntu 18.04 LTS amd64 仿生镜像构建于 2019-06-17
- 2 个 vCPU 7.5 GB 内存
- 云服务:Google Coud 引擎
- 仅单个主节点(无工作节点)
如何重现我的问题:
$ kubectl cluster-info
Kubernetes master is running at https://10.128.0.10:6443
KubeDNS is running at https://10.128.0.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ bin/spark-submit \
--master k8s://https://10.128.0.10:6443 \
--deploy-mode cluster \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=yohei1126/spark:v2.3.3 \
--class org.apache.spark.examples.SparkPi \
--name spark-pi \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar
错误日志:
- KubeDNS 正在运行,但名称解析可能无法正常运行。
$ kubectl logs spark-pi-67ed1ddda23e32799371677bf1e795c4-driver
...
2019-06-24 08:40:16 INFO SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler
cannot be instantiated
...
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation:
[get] for kind: [Pod] with name: [spark-pi-67ed1ddda23e32799371677bf1e795c4-driver]
in namespace: [default] failed.
...
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
我如何在干净的 ubuntu 上安装 k8s:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
$ apt-get update && apt-get install -y apt-transport-https curl
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ apt-get update
$ apt-get install -y kubelet kubeadm kubectl
$ apt-mark hold kubelet kubeadm kubectl
我还安装了 Docker-ce,因为 kubeadm 需要它。
$ sudo apt update
$ sudo apt install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
$ sudo apt update
$ sudo apt install -y docker-ce
我如何初始化集群:
- 为 --pod-network-cidr 指定网络地址。
$ sudo kubeadm init --pod-network-cidr=10.128.0.0/20
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ sudo sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
$ kubectl taint nodes test-k8s node-role.kubernetes.io/master:NoSchedule-
我是如何创建 docker 映像的:
- 我使用了预构建的 Spark tar。
$ wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/spark/spark-2.3.3/spark-2.3.3-bin-hadoop2.7.tgz
$ tar zxvf spark-2.3.3-bin-hadoop2.7.tgz
$ cd spark-2.3.3-bin-hadoop2.7
$ sudo bin/docker-image-tool.sh -r yohei1126 -t v2.3.3 build
$ sudo bin/docker-image-tool.sh -r yohei1126 -t v2.3.3 push
【问题讨论】:
标签: docker apache-spark kubernetes pyspark google-compute-engine