redis master 看不到slave答案

【问题标题】：redis master does not see slaveredis master 看不到slave
【发布时间】：2018-07-09 22:58:38
【问题描述】：

我的 redis sentinel 设置有问题，它有 4 个节点（1 个主节点，3 个从节点）。我已经修补了第一个从节点（docker 版本从 17.03.1-ce 更改为 17.12.0-ce）。我的问题是 master 不再将 slave node1 带到成员池中。

Slave (node1) info (It识别主节点):

$ docker exec -it redis-sentinel redis-cli info replication
 # Replication
 role:slave
 master_host:<master_ip>
 master_port:6379
 master_link_status:down

主信息：

$ docker exec -it redis-sentinel redis-cli info replication
# Replication
role:master
connected_slaves:2    
slave0:ip=<slave_2_ip>,port=6379,state=online,offset=191580670534,lag=0   
slave1:ip=<slave_3_ip>,port=6379,state=online,offset=191580666435,lag=0
master_repl_offset:191580672343

主人必须有 3 个奴隶。 node1 上的主 IP 正确（已修补）。节点 2、3、4 docker 版本是 17.03.1-ce。当我在开发中测试相同的情况时 - 一切正常。您能否提出一些建议，我需要做些什么来启用主节点和从节点 1 之间的复制？

在 docker restart (@node1) 之后，我看到了类似的内容 (msg="unknown container")：

Jan 31 08:16:12 node1 dockerd[17288]: time="2018-01-31T08:16:12.150892519+02:00" level=warning msg="unknown container" container=23e48b7846bd325ba5af772217085b60708660f5f5d8bb6fefd23094235ac01f module=libcontainerd namespace=plugins.moby
Jan 31 08:16:12 node1 dockerd[17288]: time="2018-01-31T08:16:12.177513187+02:00" level=warning msg="unknown container" container=23e48b7846bd325ba5af772217085b60708660f5f5d8bb6fefd23094235ac01f module=libcontainerd namespace=plugins.moby

当我检查 node4 主日志时，我看到 node1 已转换为从属：

1:X 30 Jan 21:35:09.301 # +sdown sentinel 66f6a8950a72952ac7df18f6a653718445fad5db node1_slave 26379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:35:10.276 # +sdown slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.388 * +reboot slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.473 # -sdown slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:10.473 # -sdown sentinel 66f6a8950a72952ac7df18f6a653718445fad5db node1_slave 26379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:20.436 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:30.516 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 21:58:40.529 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:39:48.284 * +reboot slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:39:58.391 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379
1:X 30 Jan 22:40:08.447 * +convert-to-slave slave node1_slave:6379 node1_slave 6379 @ sentinel-xx node4_master 6379

另一方面，redis-client 日志显示它无法将数据库保存在磁盘上。

$ docker logs --follow redis-client
1:M 31 Jan 07:47:09.451 * Slave node3_slave:6379 asks for synchronization
1:M 31 Jan 07:47:09.451 * Full resync requested by slave node3_slave:6379
1:M 31 Jan 07:47:09.451 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:47:09.452 # Can't save in background: fork: Out of memory
1:M 31 Jan 07:47:09.452 # BGSAVE for replication failed
1:M 31 Jan 07:47:24.628 * Slave node1_slave:6379 asks for synchronization
1:M 31 Jan 07:47:24.628 * Full resync requested by slave node1_slave:6379
1:M 31 Jan 07:47:24.628 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:47:24.628 # Can't save in background: fork: Out of memory
1:M 31 Jan 07:47:24.628 # BGSAVE for replication failed
1:M 31 Jan 07:48:10.560 * Slave node3_slave:6379 asks for synchronization
1:M 31 Jan 07:48:10.560 * Full resync requested by slave node3_slave:6379
1:M 31 Jan 07:48:10.560 * Starting BGSAVE for SYNC with target: disk

【问题讨论】：

标签： docker redis redis-sentinel

【解决方案1】：

通过将 vm.overcommit_memory 切换为 1 解决了问题。

sysctl vm.overcommit_memory=1

感谢yanhan comment

日志现在是这样的：

1:M 31 Jan 07:48:10.560 * Slave node2_slave:6379 asks for synchronization
1:M 31 Jan 07:48:10.560 * Full resync requested by slave node2_slave:6379
1:M 31 Jan 07:48:10.560 * Starting BGSAVE for SYNC with target: disk
1:M 31 Jan 07:48:10.569 * Background saving started by pid 16
1:M 31 Jan 07:49:15.773 # Connection with slave client id #388090 lost.
1:M 31 Jan 07:49:16.219 # Connection with slave node2_slave:6379 lost.
1:M 31 Jan 07:49:25.394 * Slave node1_slave:6379 asks for synchronization
1:M 31 Jan 07:49:25.395 * Full resync requested by slave node1_slave:6379
1:M 31 Jan 07:49:25.395 * Can't attach the slave to the current BGSAVE. Waiting for next BGSAVE for SYNC
1:S 31 Jan 07:49:35.421 # Connection with slave node1_slave:6379 lost.
1:S 31 Jan 07:49:35.518 * SLAVE OF node2_slave:6379 enabled (user request from 'id=395598 addr=node2_slave:33026 fd=7 name=sentinel-52caa67d-cmd age=10 idle=0 flags=x db=0     sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
1:S 31 Jan 07:49:36.121 * Connecting to MASTER node2_slave:6379
1:S 31 Jan 07:49:36.122 * MASTER <-> SLAVE sync started
1:S 31 Jan 07:49:36.135 * Non blocking connect for SYNC fired the event.
1:S 31 Jan 07:49:36.138 * Master replied to PING, replication can continue...
1:S 31 Jan 07:49:36.147 * Partial resynchronization not possible (no cached master)
1:S 31 Jan 07:49:36.153 * Full resync from master: f15e28b26604bda49ad515b38cba2639ee8e13bc:191935552685
1:S 31 Jan 07:49:46.523 * MASTER <-> SLAVE sync: receiving 1351833877 bytes from master
1:S 31 Jan 07:49:57.888 * MASTER <-> SLAVE sync: Flushing old data
16:C 31 Jan 07:50:17.083 * DB saved on disk
16:C 31 Jan 07:50:17.114 * RDB: 3465 MB of memory used by copy-on-write
1:S 31 Jan 07:51:22.749 * MASTER <-> SLAVE sync: Loading DB in memory
1:S 31 Jan 07:51:46.609 * MASTER <-> SLAVE sync: Finished with success
1:S 31 Jan 07:51:46.609 * Background saving terminated with success

【讨论】：