【发布时间】:2015-10-30 11:13:02
【问题描述】:
我正在尝试将数据批量插入到具有 3 个数据节点的 4 节点弹性搜索集群中。
数据节点规格: 16 CPU - 7GB RAM - 500GB SSD
将数据插入到非数据节点上,并拆分为 5 个分片,并设置为具有 1 个复制。 大约有 250GB 的数据要插入。
但是,在每个节点上插入约 40GB 数据并处理一小时后,在整个时间跨度内最大使用约 60% 的 CPU 和约 30% 的 RAM 后,一些分片进入初始化状态:
~$ curl -XGET 'http://localhost:9200/_cluster/health/osm?level=shards&pretty=true'
{
"cluster_name" : "elastic_osm",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"indices" : {
"osm" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 9,
"relocating_shards" : 1,
"initializing_shards" : 1,
"unassigned_shards" : 0,
"shards" : {
"0" : {
"status" : "yellow",
"primary_active" : true,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 0
},
"1" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"2" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 1,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"3" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
},
"4" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 2,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
}
}
}
}
再深入一点,发现一个节点的堆空间有问题:
~$ curl -XGET 'localhost:9200/osm/_search_shards?pretty=true'
{
"nodes" : {
"1DpvDUf7SKywJrBgQqs9eg" : {
"name" : "elastic-osm-node-1",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
},
"FiBYw-v_QfO3nJQfHflf_w" : {
"name" : "elastic-osm-node-3",
"transport_address" : "inet[/xxx.xxx.x.x:x]",
"attributes" : {
"master" : "true"
}
},
"ibpt8lGiS6yDJf4e09RN9Q" : {
"name" : "elastic-osm-node-2",
"transport_address" : "inet[/xxx.xxx.x.x:xxxx]",
"attributes" : {
"master" : "true"
}
}
},
"shards" : [ [ {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 0,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 0,
"index" : "osm",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2015-10-30T10:42:25.539Z",
"details" : "shard failure [engine failure, reason [already closed by tragic event]][OutOfMemoryError[Java heap space]]"
}
} ], [ {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 1,
"index" : "osm"
} ], [ {
"state" : "RELOCATING",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : "1DpvDUf7SKywJrBgQqs9eg",
"shard" : 2,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 2,
"index" : "osm"
}, {
"state" : "INITIALIZING",
"primary" : false,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : "FiBYw-v_QfO3nJQfHflf_w",
"shard" : 2,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "1DpvDUf7SKywJrBgQqs9eg",
"relocating_node" : null,
"shard" : 3,
"index" : "osm"
} ], [ {
"state" : "STARTED",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
}, {
"state" : "STARTED",
"primary" : true,
"node" : "FiBYw-v_QfO3nJQfHflf_w",
"relocating_node" : null,
"shard" : 4,
"index" : "osm"
} ] ]
}
但是服务器上设置的 ES_HEAP_SIZE 是内存的一半:
~$ echo $ES_HEAP_SIZE
7233.0m
而且用量只有5g:
~$ free -g
total used
Mem: 14 5
如果我再等一会儿,节点就会完全离开集群,所有副本都会进入初始化状态,这会使我的插入失败并停止:
{
"state" : "INITIALIZING",
"primary" : false,
"node" : "ibpt8lGiS6yDJf4e09RN9Q",
"relocating_node" : null,
"shard" : 3,
"index" : "osm",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2015-10-30T10:53:32.044Z",
"details" : "node_left[FiBYw-v_QfO3nJQfHflf_w]"
}
Conf : 为了加快插入速度,我在数据节点 elasticsearch 配置中使用了这些参数
刷新间隔:-1, threadpool.bulk.size: 16, threadpool.bulk.queue_size: 1000
为什么会这样?我该如何解决这个问题并让我的批量插入成功? 最大堆大小是否需要超过 50% 的 RAM?
编辑:由于调整弹性搜索参数不好,我删除了线程池参数,它工作但非常缓慢。 Elasticsearch 的设计目的不是太快地摄取太多数据。
【问题讨论】:
标签: elasticsearch