zookeeper 本质上是一个分布式实时key-value存储数据库。在现代商业系统中,部署很广泛。
之前,搭过很多次zookeeper, 虽然不难,但是步骤比较繁琐,每次搭建5个节点都要耗费至少1小时时间。
后来采用ansible部署,写成了一个独立的ansible role,变成一个标准。 使用者只要配置一下参数即可,使用起来非常方便,
我在实际使用,全程部署:8分钟,100%成功,不依赖于部署者的心情。 实际部署速度主要取决于网络速度,和部署的节点数量。
一般商业使用,建议至少部署5个节点。3个节点虽然可以使用,但是比较脆弱。
部署过程和环境要求描述如下:
相关代码可以查看我的github https://github.com/HappyFreeAngel/zookeeper-cluster-offline-install.git
| 组件名称 | 版本 | 是否必须 | 下载链接 |
| 操作系统 | centos7 1608 | 是 | 见官网 |
| JDK | jdk8 | 是 | 见官方 |
| zookeeper | 是 | https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.4-beta/zookeeper-3.5.4-beta.tar.gz | |
| lsof | 否 | ||
| nc | 否 | ||
| ssh-passwordless-login | 1.0.0 | 否 | https://github.com/HappyFreeAngel/passwordless-ssh-login.git |
| 序号 | 虚拟机名称 | IP | ||
| 1 | zkb1 | 10.20.2.51 | ||
| 2 | zkb2 | 10.20.2.51 | ||
| 3 | zkb3 | 10.20.2.51 | ||
| 4 | zkb4 | 10.20.2.51 | ||
| 5 | zkb5 | 10.20.2.51 |
安装过程描述:
1. 首先准备 要安装的虚拟机或物理机. 创建相应的机器,设置好IP地址,确保能互相ping 通.
2. 部署zookeeper
3. 测试,确认部署成功.
- name: zookeeper-cluster offline install playbook include many books.
hosts: localhost
gather_facts: False
# become: yes
# become_method: sudo
vars:
projectinfo: "{{ lookup('file','input.yml') | from_yaml }}"
vm_host_list: []
domain_group_dict: {}
pre_tasks:
- set_fact: task_startup_timestamp="{{lookup('pipe','date \"+%Y-%m-%d %H:%M:%S\"')}}"
- name: "这个是在每个任务执行之前会执行的任务."
shell: echo "任务开始...,检查依赖的文件是否存在."; ./before-run.sh;
- name: "检查本地项目文件夹里的文件是否存在"
shell: ./check-file-exist-status.sh
register: files_status
- name: "if stdout check failed,interrupt execution"
fail: msg="出错了,有文件的链接失效,文件不存在"
when: '"does not exist" in files_status.stdout'
- name: "检查role依赖是否正常,版本是否正确" #todo
shell: ./check-role-dependency.sh
register: role_dependency_status
- name: "role依赖缺失"
fail: msg="role依赖存在问题"
when: '"role does not exist" in role_dependency_status.stdout'
- name: "set projectroot short hand hostdict"
set_fact: projectroot="{{projectinfo['project_root']}}"
- name: "set commonsetting short hand vars"
set_fact: commonsetting="{{projectroot['common']}}"
- name: "set hostdict short hand vars"
set_fact: hostdict="{{projectroot['all_hosts']}}"
- name: "set hostconfig short hand vars"
set_fact: hostconfig="{{projectroot['host_config']}}"
- name: "set hostconfig short hand vars"
set_fact: zookeeperconfig="{{projectroot['host_config']['zookeeper_config']}}"
- name: "vcenterconfig"
set_fact: vcenterconfig="{{projectroot['vsphere_platform']['vmware_esxi']}}"
- name: "set fact"
set_fact: virtualbox_template_name="{{projectroot['host_config']['vagrant_config']['virtualbox_template_name']}}"
when: projectroot['deploy_vsphere_platform']=='vmware_esxi'
- name: "set fact"
set_fact: vm_bridge_nic_name="eth1"
- name: "批量合并列表合并对象"
set_fact: vm_host_list="{{ vm_host_list }} + {{ hostdict[item] }}"
with_items: "{{hostdict.keys()}}"
when: hostdict[item][0].ismaster == true
- name: "生成临时group-domain-ip映射表文本文件/tmp/group_domain_ip.txt"
template: src=templates/group_domain_ip_user_password.txt.j2 dest=/tmp/group_domain_ip_user_password.txt
- name: "把/tmp/group_domain_ip_user_password.txt内容放到注册变量里"
shell: cat /tmp/group_domain_ip_user_password.txt
register: group_domain_ip_user_password
#注意密码和用户名不能包含:和逗号,否则就出错了,因为分割符号是,:无法正确分割..
#hadoop-namenode-hosts:hadoop-namenode1.yourdomain.com:10.20.2.1:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode2.yourdomain.com:10.20.2.2:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode3.yourdomain.com:10.20.2.3:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode1.yourdomain.com:10.20.2.11:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode2.yourdomain.com:10.20.2.12:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode3.yourdomain.com:10.20.2.13:centos:yourpassword
- set_fact: group_domain_ip_user_password_list={{ group_domain_ip_user_password.stdout.split(',') }}
- add_host:
hostname: "{{item.split(':')[1]}}"
groups: "{{item.split(':')[0]}}"
ansible_host: "{{item.split(':')[2]}}"
# ansible_port: 22
ansible_user: "{{item.split(':')[3]}}"
ansible_ssh_pass: "{{item.split(':')[4]}}"
with_items: "{{group_domain_ip_user_password_list}}"
#特别注意,这里都是root 用户,hadoop 用户还没有创建.
- name: "set short hand vars"
set_fact: dnsconfig="{{hostconfig['dns_config']}}"
- name: "动态创建/修改DNS 记录 (DDNS) 当域名没有解析或解析不正确时才添加解析. the current host is {{ansible_hostname}}. create A record {{ item.name }}-->ip:{{ item.ip }}"
nsupdate:
key_name: "{{dnsconfig['key_name']}}"
key_secret: "{{dnsconfig['dns_update_key']}}"
server: "{{commonsetting['citybox_work_network']['dnsserver1']}}"
zone: "{{dnsconfig['zone']}}"
record: "{{item.name.split('.')[0]}}"
value: "{{ item.ip }}"
with_items: "{{hostdict['zookeeper-hosts']}}"
when: lookup('dig', item.name) != item.ip
#顶层的playbook include,不是task include
roles:
- role: vmware-del-vm
user_vcenterconfig: "{{ vcenterconfig }}"
user_host_list: "{{ hostdict['zookeeper-hosts'] }}" #这个名称不能用appconfig,会冲突.
async: 300
poll: 0
when: projectroot['deploy_vsphere_platform']=="vmware_esxi"
# when: inventory_hostname.find('zookeeper')!=-1
- role: vmware-create-vm
user_vcenterconfig: "{{ vcenterconfig }}"
user_host_list: "{{ hostdict['zookeeper-hosts'] }}" #这个名称不能用user_host_list,会冲突.
user_vm_network: "{{commonsetting['citybox_work_network']}}"
async: 600
poll: 0
when: projectroot['deploy_vsphere_platform']=="vmware_esxi"
- role: wait-in-second
max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30 + 150 }}"
- role: vmware-poweredon-vm
user_vcenterconfig: "{{ vcenterconfig }}"
user_host_list: "{{ hostdict['zookeeper-hosts'] }}"
async: 240
poll: 0
- role: waitfor-vm-startup
max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30 + 150 }}"
user_host_list: "{{ hostdict['zookeeper-hosts'] }}"
- role: system-storage-increase
host_list: "{{ hostdict['zookeeper-hosts'] }}"
target_device: "/dev/sda"
virtual_machine_template_disk_size_in_gb: "{{ vcenterconfig['virtual_machine_template_disk_size_in_gb'] }}"
file_system: "xfs"
mount_dir: "/var/server"
- role: dns-resolve
host_list: "{{ hostdict['zookeeper-hosts'] }}"
dns_server_ip: "{{vcenterconfig['dnsserver1']}}"
- role: dns-resolve
host_list: "{{ hostdict['zookeeper-hosts'] }}"
dns_server_ip: "8.8.8.8"
##- import_playbook: tasks/test-password-less-login.yml
- import_playbook: tasks/system-performance-tune.yml
- import_playbook: tasks/zookeeper.yml
- import_playbook: tasks/reboot-host-and-wait-for-host-up.yml host_list="{{ hostdict['zookeeper-hosts'] }}" max_wait_time_in_seconds=200
- import_playbook: tasks/notify.yml
#####下面是配置文件格式
--- #config file version-1.1.0 2018-08-22
project_root: #字典开头的空2格,列表开头的子项空2个空格.
project_info:
project_descripton: "Zookeeper集群离线自动化部署"
version: "1.0"
source_code: "your-git-download-link"
created_date: "2017-06-01"
author_list:
- name: "作者"
phone: "dianhua"
email: "[email protected]"
weixin: "todo"
QQ: "todo"
vsphere_platform:
virtualbox:
vagrant_offline_install_file: "vagrant_2.0.2_x86_64.rpm"
virtualbox_offline_install_file: "VirtualBox-5.2-5.2.6_120293_el7-1.x86_64.rpm"
vagrant_box_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9"
vmware_esxi:
vcenterhostname: "" #vcenter.yourdomain.com 如果域名没有解析,在执行机器上设置hosts也可以
vcenterusername: "[email protected]"
vcenterpassword: ""
datacenter: ""
default_datastore: "cw_m4_sas_datastore" #"cw_m4_pcie_datastore2 cw_m4_sas_datastore"
template: "centos1611_docker_jdk8_template"
virtual_machine_template_disk_size_in_gb: 30
resource_pool: "hadoopcluster"
folder: "/vm"
dnsserver1: "10.20.1.1" #这个是create-dns-record.yml 里面要访问到的IP,也是dns-host[0].ip
dnsserver2: "114.114.114.114"
state: "poweredon"
esxi_nic_network:
vlan: "VM Network" #"192.100.x.x"
gateway: "10.20.0.1" # sudo route add -net 11.23.3.0 -netmask 255.255.255.128 11.23.3.1
netmask: "255.255.0.0"
dnsserver1: "10.20.1.1"
dnsserver2: "114.114.114.114"
datastore:
rabbitmq_datastore: "cw_m4_sas_datastore"
vmware_workstation:
openstack:
huawei_fusion_vsphere:
deploy_vsphere_platform: "vmware_esxi"
common:
vm_platform: "vmware-vsphere" #vagrant, vmware-vsphere,huawei-vsphere
period_force_time_sync: "yes"
nic_name: "eens160" #ens160 enp0s3
is_internet_up: false
rabbitmq_datastore: "cw_m4_sas_datastore"
software_root_dir: "/var/server" #这个跟下面的配置是相关的,如果修改了, 下面相关的目录必须跟着改.
citybox_work_network:
vlan: "10.20.0.0_10G-port" #"10.20.x.x"
gateway: "10.20.0.1" #10.20.1.1 to do
netmask: "255.255.0.0"
dnsserver1: "10.20.1.1"
dnsserver2: "114.114.114.114"
network: "10.20.0.0/16"
host_config:
mail_agent_info:
host: "smtp.mxhichina.com"
secure_smtp_port_ipv4: "465"
secure: "always"
username: "[email protected]"
password: ""
sender: "[email protected]"
mail_notify_info:
receiver_name: "Happy"
to: "[email protected]"
bcc: "[email protected]"
cc: "[email protected]"
charset: "utf-8"
subject: "Ansible 自动创建Hadoop集群报告"
body: "项目Hadoop集群已经创建成功."
dns_config:
zone: "yourdomain.com"
key_name: "yourdomain.com"
dns_update_key: ""
docker_config:
docker_default_data_path: "/var/lib/docker"
docker_data_folder_name: "docker-data" # 默认放在 /var/server目录下
vagrant_config:
app_home: "/Volumes/linyingjie/mesos-test" # "/var/server/mesos-test" #
virtualbox_template_file_path: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9.box"
virtualbox_template_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9"
vm_bridge_nic_name: "ens1f0"
java_config:
#app_home: "/var/server/jre" #jre-8u181-linux-x64.tar.gz
jre_targz: "jre-8u181-linux-x64.tar.gz" #jre-10.0.1_linux-x64_bin.tar.gz #tar -zxvf jre-9.0.4_linux-x64_bin.tar.gz -C jre9 --strip-components=1
jre_foldername: "jre"
jre_version: "1.8"
jdk_targz: "jdk-8u131-linux-x64.tar.gz"
jdk_foldername: "jdk"
jdk_version: "1.8"
go_config:
app_home: "/var/server/go"
app_foldername: "go"
install_filename: "go1.10.linux-amd64.tar.gz"
version: "1.10"
ansible_config:
app_home: "/var/server/ansible"
app_foldername: "ansible"
install_filename_rpm_tgz: "ansible-offline-install-2.6.0.rpms.tgz"
version: "2.6.0"
ntp_config:
app_home: "/var/server/ntp"
timezone: "Asia/Shanghai"
port: "123"
ntp_server_list:
- hostname: 10.20.1.1
command: iburst
- hostname: 1.asia.pool.ntp.org
command: iburst
# - hostname: 0.asia.pool.ntp.org
# command: iburst
#
# - hostname: 1.asia.pool.ntp.org
# command: iburst
zookeeper_config:
zookeeper_username: "zookeeper"
zookeeper_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/"
root_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/"
app_home: "/var/server/zookeeper"
zookeeper_tgz: "zookeeper-3.5.4-beta.tar.gz"
docker_image_name: "docker.yourdomain.com/ascs/zookeeper"
docker_image_version: "3.5.3-beta-alpine"
docker_compressed_image_tgz: "zookeeper-3.5.3-beta-alpine.image.tgz"
#特别注意下面是跟镜像有关系的,不同的镜像路径可能不一样.
conf_dir: "/var/server/zookeeper/conf"
data_dir: "/var/server/zookeeper/data"
data_log_dir: "/var/server/zookeeper/log"
# conf_dir: "/conf"
# data_dir: "/data"
# data_log_dir: "/datalog"
open_port_list:
- port_type: tcp
port_number: 2181
immediate: True
permanent: True
state: enabled # 有4个选项 enabled, disabled, present, absent
description: ""
- port_type: tcp
port_number: 2888
immediate: True
permanent: True
state: enabled # 有4个选项 enabled, disabled, present, absent
description: ""
- port_type: tcp
port_number: 3888
immediate: True
permanent: True
state: enabled # 有4个选项 enabled, disabled, present, absent
description: ""
zookeeper_client_connection_tcp_port_ipv4: "2181"
zookeeper_peer_communication_tcp_port_ipv4: "2888"
zookeeper_leader_select_tcp_port_ipv4: "3888"
#ENV ZOO_USER=zookeeper \
# ZOO_CONF_DIR=/conf \
# ZOO_DATA_DIR=/data \
# ZOO_DATA_LOG_DIR=/datalog \
# ZOO_PORT=2181 \
# ZOO_TICK_TIME=2000 \
# ZOO_INIT_LIMIT=5 \
# ZOO_SYNC_LIMIT=2 \
# ZOO_MAX_CLIENT_CNXNS=60 \
# ZOO_STANDALONE_ENABLED=false
a_4lw_commands_whitelist: "stat, ruok, conf, isro,wchs, wchc, wchp, cons, dump, envi, reqs"
# 使用echo ruok|nc 127.0.0.1 2181 测试是否启动了该Server,若回复imok表示已经启动。 are you ok=ruok
# echo dump| nc 127.0.0.1 2181 ,列出未经处理的会话和临时节点。
# echo kill | nc 127.0.0.1 2181 ,关掉server
# echo conf | nc 127.0.0.1 2181 ,输出相关服务配置的详细信息。
# echo cons | nc 127.0.0.1 2181 ,列出所有连接到服务器的客户端的完全的连接 / 会话的详细信息。
# echo envi |nc 127.0.0.1 2181 ,输出关于服务环境的详细信息(区别于 conf 命令)。
# echo reqs | nc 127.0.0.1 2181 ,列出未经处理的请求。
# echo wchs | nc 127.0.0.1 2181 ,列出服务器 watch 的详细信息。
# echo wchc | nc 127.0.0.1 2181 ,通过 session 列出服务器 watch 的详细信息,它的输出是一个与 watch 相关的会话的列表。
# echo wchp | nc 127.0.0.1 2181 ,通过路径列出服务器 watch 的详细信息。它输出一个与 session 相关的路径。
all_hosts:
zookeeper-hosts:
- name: "zkb1.yourdomain.com"
uuid: "zkb1.yourdomain.com"
ip: "10.20.3.51"
cpu: "1"
memory: "4096" # 600MB 以上
disk: 30
username: "root"
password: "yourpassword"
datastore: "cw_m4_pcie_datastore1"
host_machine: "192.168.3.11"
ismaster: true
- name: "zkb2.yourdomain.com"
uuid: "zkb2.yourdomain.com"
ip: "10.20.3.52"
cpu: "1"
memory: "4096"
disk: 30
username: "root"
password: "yourpassword"
datastore: "cw_m4_pcie_datastore2"
host_machine: "192.168.3.11"
ismaster: true
- name: "zkb3.yourdomain.com"
uuid: "zkb3.yourdomain.com"
ip: "10.20.3.53"
cpu: "1"
memory: "4096"
disk: 30
username: "root"
password: "yourpassword"
datastore: "cw_m4_pcie_datastore1"
host_machine: "192.168.3.11"
ismaster: true
- name: "zkb4.yourdomain.com"
uuid: "zkb4.yourdomain.com"
ip: "10.20.3.54"
cpu: "1"
memory: "4096"
disk: 30
username: "root"
password: "yourpassword"
datastore: "cw_m4_pcie_datastore2"
host_machine: "192.168.3.11"
ismaster: true
- name: "zkb5.yourdomain.com"
uuid: "zkb5.yourdomain.com"
ip: "10.20.3.55"
cpu: "1"
memory: "4096"
disk: 30
username: "root"
password: "yourpassword"
datastore: "cw_m4_pcie_datastore1"
host_machine: "192.168.3.11"
ismaster: true
[[email protected] ~]# more /etc/hosts
# Ansible managed
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#格式类似
#192.168.12.21 master.yourdomain master
10.20.3.51 zkb1.yourdomain zkb1
10.20.3.52 zkb2.yourdomain zkb2
10.20.3.53 zkb3.yourdomain zkb3
10.20.3.54 zkb4.yourdomain zkb4
10.20.3.55 zkb5.yourdomain zkb5
happy:~ happy$ echo stat | nc 10.20.3.51 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51162[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.52 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51163[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.53 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51164[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: leader
Node count: 16
Proposal sizes last/min/max: 32/32/32
happy:~ happy$ echo stat | nc 10.20.3.54 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51167[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.55 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51169[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16