使用 discovery-ec2 设置 Elasticsearch 7 集群答案

【问题标题】：Setting up an Elasticsearch 7 cluster using discovery-ec2使用 discovery-ec2 设置 Elasticsearch 7 集群
【发布时间】：2021-02-09 12:17:44
【问题描述】：

我正在尝试使用 ec2 发现在 7.10.2 上设置一个小型 Elastic Search 集群。我之前在 Elastic Search 6.x 中完成了此操作，但无法让新集群中的节点在新集群中相互通信。

elasticsearch.yml 中的修改设置

cluster.name: my-cluster
network.host: _ec2_
discovery.seed_providers: ec2
discovery.ec2.groups: my-cluster
discovery.ec2.host_type: private_ip
discovery.ec2.endpoint: ec2.us-west-2.amazonaws.com
cloud.node.auto_attributes: true

它们位于允许端口 9300 上的流量的 my-cluster 安全组中，因此我可以在该端口上的节点之间进行远程登录。

他们有一个授予他们 ec2:DescribeInstances 的 IAM 角色。

从日志中我可以看出 discovery-ec2 插件已加载，但它没有找到任何东西，也没有给我指出正确方向的错误。

version[7.10.2], pid[33578], build[default/deb/747e1cc71def077253878a59143c1f785afa92b9/2021-01-13T00:42:12.435326Z], OS[Linux/5.4.0-1037-aws/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/15.0.1/15.0.1+9]
[2021-02-08T19:51:34,743][INFO ][o.e.n.Node               ] [ip-10-4-0-84] JVM home [/usr/share/elasticsearch/jdk], using bundled JDK [true]
[2021-02-08T19:51:34,744][INFO ][o.e.n.Node               ] [ip-10-4-0-84] JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms2g, -Xmx2g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-2170055034768528894, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=1073741824, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb, -Des.bundled_jdk=true]
[2021-02-08T19:51:38,435][INFO ][o.e.p.PluginsService     ] [ip-10-4-0-84] loaded plugin [discovery-ec2]
[2021-02-08T19:51:38,436][INFO ][o.e.p.PluginsService     ] [ip-10-4-0-84] loaded plugin [repository-s3]
[2021-02-08T19:51:38,548][INFO ][o.e.e.NodeEnvironment    ] [ip-10-4-0-84] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [59.4gb], net total_space [61.9gb], types [ext4]
[2021-02-08T19:51:38,551][INFO ][o.e.e.NodeEnvironment    ] [ip-10-4-0-84] heap size [2gb], compressed ordinary object pointers [true]
[2021-02-08T19:51:38,659][INFO ][o.e.n.Node               ] [ip-10-4-0-84] node name [ip-10-4-0-84], node ID [Woo0ox_cTx26haob9pEHIQ], cluster name [my-cluster], roles [transform, master, remote_cluster_client, data, ml, data_content, data_hot, data_warm, data_cold, ingest]
[2021-02-08T19:51:43,415][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [ip-10-4-0-84] [controller/33777] [Main.cc@114] controller (64 bit): Version 7.10.2 (Build 40a3af639d4698) Copyright (c) 2021 Elasticsearch BV
[2021-02-08T19:51:44,534][INFO ][o.e.x.s.a.s.FileRolesStore] [ip-10-4-0-84] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2021-02-08T19:51:45,671][INFO ][o.e.t.NettyAllocator     ] [ip-10-4-0-84] creating NettyAllocator with the following configs: [name=elasticsearch_configured, chunk_size=256kb, suggested_max_allocation_size=256kb, factors={es.unsafe.use_netty_default_chunk_and_page_size=false, g1gc_enabled=true, g1gc_region_size=1mb}]
[2021-02-08T19:51:45,751][INFO ][o.e.d.DiscoveryModule    ] [ip-10-4-0-84] using discovery type [zen] and seed hosts providers [settings, ec2]
[2021-02-08T19:51:46,279][WARN ][o.e.g.DanglingIndicesState] [ip-10-4-0-84] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2021-02-08T19:51:46,737][INFO ][o.e.n.Node               ] [ip-10-4-0-84] initialized
[2021-02-08T19:51:46,737][INFO ][o.e.n.Node               ] [ip-10-4-0-84] starting ...
[2021-02-08T19:51:46,864][INFO ][o.e.t.TransportService   ] [ip-10-4-0-84] publish_address {10.4.0.84:9300}, bound_addresses {10.4.0.84:9300}
[2021-02-08T19:51:47,054][INFO ][o.e.b.BootstrapChecks    ] [ip-10-4-0-84] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-02-08T19:51:57,069][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-4-0-84] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{ip-10-4-0-84}{Woo0ox_cTx26haob9pEHIQ}{BOsDS3OES1aRCUk3r1STEA}{10.4.0.84}{10.4.0.84:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064808960, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}, {ip-10-4-0-97}{Ilu5yqtNSe-BnSvlJ49egw}{IPAfgbDERkqBmDUYalVd6Q}{10.4.0.97}{10.4.0.97:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064808960, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}, {ip-10-4-0-223}{AOlhfJv8T-K9jGz-pqF9kg}{YN6tY2XwRiO4_UzGPzc7WA}{10.4.0.223}{10.4.0.223:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064800768, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.4.0.97:9300, 10.4.0.84:9300, 10.4.0.223:9300] from hosts providers and [{ip-10-4-0-84}{Woo0ox_cTx26haob9pEHIQ}{BOsDS3OES1aRCUk3r1STEA}{10.4.0.84}{10.4.0.84:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064808960, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

【问题讨论】：

标签： elasticsearch amazon-ec2

【解决方案1】：

查看您的集群日志，我可以看到未发现可能导致问题的主节点，如下面的日志行所示。

[2021-02-08T19:51:57,069][警告 ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-4-0-84] 主不尚未发现，此节点以前没有加入过引导程序 (v7+) 集群，并且 [cluster.initial_master_nodes] 在此为空节点：已发现 [{ip-10-4-0-84}{Woo0ox_cTx26haob9pEHIQ}{BOsDS3OES1aRCUk3r1STEA}{10.4.0.84}{10.4.0.84:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064808960，xpack.installed=true， transform.node=true，ml.max_open_jobs=20}， {ip-10-4-0-97}{Ilu5yqtNSe-BnSvlJ49egw}{IPAfgbDERkqBmDUYalVd6Q}{10.4.0.97}{10.4.0.97:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064808960，ml.max_open_jobs=20， xpack.installed=true，transform.node=true}， {ip-10-4-0-223}{AolhfJv8T-K9jGz-pqF9kg}{YN6tY2XwRiO4_UzGPzc7WA}{10.4.0.223}{10.4.0.223:9300}{cdhilmrstw}{aws_availability_zone=us-west-2a, ml.machine_memory=4064800768，ml.max_open_jobs=20， xpack.installed=true, transform.node=true}];发现将继续使用 [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304、127.0.0.1:9305、[::1]:9300、[::1]:9301、[::1]:9302、[::1]:9303、[::1]： 9304, [::1]:9305, 10.4.0.97:9300, 10.4.0.84:9300, 10.4.0.223:9300] 来自主机提供商和 [{ip-10-4-0-84}{Woo0ox_cTx26haob9pEHIQ}{BOsDS3OES1aRCUk3r1STEA}{10.4.0.84}{10.4.0.84:9300}{cdhilmrstw}{aws_availability_zone=us-west- 2a, ml.machine_memory=4064808960，xpack.installed=true， transform.node=true, ml.max_open_jobs=20}] 来自最后一个已知集群状态;节点 term 0，term 0 中最后接受的版本 0

根据日志，cluster.initial_master_nodes 设置为空，您应该参考bootstrapping a cluster 了解有关此设置和引导集群的更多信息。

对于更详细的日志，您可以在负责ec2-discovery 插件的org.elasticsearch.discovery.ec2 包上启用TRACE 登录，并应提供更详细的日志，以帮助您确定原因并修复。

【讨论】：

引导链接正是我所需要的。我假设所有节点都会自动成为主节点。使用 ansible 我设置了 node.name: "{{ inventory_hostname }}" 和 cluster.initial_master_nodes: {{ ansible_play_hosts }} 它像魅力一样引导。
@MichaelOConnell 很高兴我能帮上忙，非常感谢您的支持和接受回答。