【问题标题】:MongoError No primary node is availableMongoError 没有可用的主节点
【发布时间】:2019-07-09 10:56:52
【问题描述】:

我们最近升级到 Play 2.7.0 并使用 play2-reactivemongo 版本 0.16.2 和 reactivemongo 0.16.3。我们使用reactivemongo-shaded-native,但也尝试不使用。

我们正在连接到具有 3 个节点的副本集,即 MongoDB Atlas 上的 MongoDB 3.6.10。

初始连接正常,服务运行一段时间正常。

但最后我们遇到了这个错误:

[error] 2019-02-15 09:40:30,466 r.api.Failover2 - [Supervisor-1/Connection-2] Got an error, no more attempts to do. Completing with a failure...

 reactivemongo.core.actors.Exceptions$PrimaryUnavailableException: MongoError['No primary node is available! (Supervisor-1/Connection-2)']
 Caused by: reactivemongo.core.actors.Exceptions$InternalState: null
    at reactivemongo.ConnectAll$IsMaster(400, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Unknown (20/20 available connections), latency=9223372036854775807, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=135, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=140, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615628>)
    at reactivemongo.IsMaster(400, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Unknown (20/20 available connections), latency=9223372036854775807, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=135, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615626>)
    at reactivemongo.ConnectAll$IsMaster(399, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Unknown (20/20 available connections), latency=9223372036854775807, authenticated={}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=135, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615623>)
    at reactivemongo.IsMaster(399, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Unknown (20/20 available connections), latency=9223372036854775807, authenticated={}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615622>)
    at reactivemongo.RefreshAll({{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Primary (20/20 available connections), latency=30591, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615486>)
    at reactivemongo.PrimaryUnavailable(<time:1550223615485>)
    at reactivemongo.ConnectAll({{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Primary (20/20 available connections), latency=30591, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223615475>)
    at reactivemongo.ConnectAll$IsMaster(397, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Primary (20/20 available connections), latency=30591, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=21, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223605498>)
    at reactivemongo.IsMaster(397, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Primary (20/20 available connections), latency=30591, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=15, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223605498>)
    at reactivemongo.ConnectAll$IsMaster(396, {{NodeSet Some(PlayappDevelopment-shard-0) Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: Primary (20/20 available connections), latency=30591, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-01-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=17, authenticated={Authenticated(admin,playapp_rw)}] | Node[playappdevelopment-shard-00-02-tedsb.mongodb.net:27017: Secondary (20/20 available connections), latency=15, authenticated={Authenticated(admin,playapp_rw)}] }})(<time:1550223605493>)

还有这个:

[error] 2019-02-15 09:40:25,157 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 80193 not found! complete message is Response(MessageHeader(16775938,220053,80193,1), Reply(8,0,0,1), ResponseInfo(b0706d7f))
[error] 2019-02-15 09:40:25,720 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 79580 not found! complete message is Response(MessageHeader(16776408,220065,79580,1), Reply(8,0,0,1), ResponseInfo(5c65523c))
[error] 2019-02-15 09:40:26,616 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 80346 not found! complete message is Response(MessageHeader(16776173,220060,80346,1), Reply(8,0,0,1), ResponseInfo(5613ea02))
[error] 2019-02-15 09:40:26,702 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 80457 not found! complete message is Response(MessageHeader(16776643,220079,80457,1), Reply(8,0,0,1), ResponseInfo(2fa23bb9))
[error] 2019-02-15 09:40:29,612 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 80174 not found! complete message is Response(MessageHeader(16776878,220102,80174,1), Reply(8,0,0,1), ResponseInfo(4faa6eec))
[error] 2019-02-15 09:40:30,257 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Oups. 80191 not found! complete message is Response(MessageHeader(16776643,220111,80191,1), Reply(8,0,0,1), ResponseInfo(1fb6b5e0))

似乎应用程序无法(完全)恢复——20 个请求中有 1 个成功。

2019-02-15 09:59:45,476 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] Node[playappdevelopment-shard-00-00-tedsb.mongodb.net:27017: \
  Unknown (20/20 available connections), latency=9223372036854775807, authenticated={Authenticated(admin,playapp_rw)}] \
  hasn't answered in time to last ping! Please check its connectivity

有大量“没有请求渠道”消息:

2019-03-05 15:02:58,433 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48879, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:58,580 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48880, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:58,678 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48881, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:58,777 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48882, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:58,874 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48883, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:58,972 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48884, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,069 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48885, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,220 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48886, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,318 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48887, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,416 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48888, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,514 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48889, 0, Query(0,assets.$cmd,0,1), Primary, None)
2019-03-05 15:02:59,611 r.c.a.MongoDBSystem - [Supervisor-1/Connection-2] No channel for request: Request(48890, 0, Query(0,assets.$cmd,0,1), Primary, None)

这似乎与对 Play HTTP 服务器的大量并发请求有关,这可能会使主节点的所有可用通道饱和,导致任何主节点状态刷新失败。

显然,当连接到只有一个节点的本地副本集时,我们没有这个问题。


问题是,我该如何解决这个问题?

任何人都可以理解所提供的日志消息吗?

我的分析是否正确,即。 ReactiveMongo 会不会因为通道只是“忙”而无法查询主状态?


更新:我们从 ReactiveMongo(在过去 4 年中为我们提供了很好的服务)切换到了官方的 mongo-scala-driver ——这真的很痛苦,并且仍然是(主要是因为“Registry”-no-compile-time-type-safety 概念通过Class[T]s 等)我们不再看到连接问题。

【问题讨论】:

  • 尝试发布 0.16.6(顺便说一句,每个节点 200 个通道对我来说似乎“很多”)
  • @cchantep 感谢您的回复!不幸的是,现在为时已晚,我们已经切换到官方驱动并且不想回滚(可能除了使用 reactivemongo-bson 库)。 200 个连接对我们来说只是一个绝望的选择,但目前使用官方驱动程序,在运行一段时间后,我们在 prod 上有大约 500 个活动连接(我们在 MongoDB Atlas 上的集群最大为 2000 个)(连接是池化的,但除非需要,否则不会主动创建)。相比之下,在 dev 上我们只有大约 60 个连接。

标签: scala reactivemongo play-reactivemongo play-framework-2.7


【解决方案1】:

将 Reactive Mongo 升级到 0.17.1 后它对我们有效

【讨论】:

    【解决方案2】:

    您似乎达到了可用连接的限制:您是否尝试过调整连接参数http://reactivemongo.org/releases/0.11/documentation/tutorial/connect-database.html? Atlas通常支持20多个连接https://docs.atlas.mongodb.com/connection-limits/

    我解决了一个类似的问题,mongo 原生 scala 驱动程序设置了 50 个连接池、更长的等待队列和更长的等待时间(由于与 Atlas 连接的延迟)

    【讨论】:

    • 我们目前正在运行 200 个连接。问题似乎仍然存在(必须调查日志)。我会尝试按照您的建议配置等待时间。
    • 这并没有为我解决,我怀疑reactivemongo中确实存在一个错误,因为即使服务器上没有负载并且有200个稳定连接,它也无法查询主节点每个节点。
    【解决方案3】:

    你试过了吗:http://reactivemongo.org/releases/0.1x/documentation/tutorial/connect-database.html#netty-native

    遇到了类似的错误,一位同事提出了上述建议。似乎对我们有用。

    【讨论】:

    • 是的,这就是我所说的“我们正在使用响应式mongo-shaded-native”的意思。
    【解决方案4】:

    请确保您的用户有权访问您尝试连接的数据库。我有同样的问题。我正在使用属于其他数据库的一个用户进行测试。后来我为我的数据库创建了一个用户,然后进行了测试,它工作正常。

    【讨论】:

    • 这显然不是问题所在,因为初始连接工作正常,几分钟后才开始出现问题。
    【解决方案5】:

    我曾为同样的问题苦苦挣扎了一段时间。如果您使用的是 M0、M2 或 M5 实例,一个可能的原因是服务器带宽非常有限。对于这些实例,很容易发出需要很长时间的查询。

    您可以在配置中增加超时设置,以便 reactivemongo 不会放弃不返回的连接。在我的情况下,我需要使超时至少 30 秒。诊断此问题的一种方法是查看 mongo atlas 控制台中的指标,并查看在遇到问题期间网络带宽是否固定(一条直线)。如果是,那很可能是您的问题。

    我将我的 Mongo Atlas 服务器升级到了 M10,从那以后没有出现任何问题。

    【讨论】:

    • 实际上,我们从 M10 实例开始(产品上的 M30)。这对我们来说根本没有什么不同。不过谢谢分享!
    • 一个更新——当主节点发生变化时,我在 M10 实例中看到了这一点。它并不总是发生,但每当我收到该错误时,我都可以在 Atlas 中看到主节点已更新。
    【解决方案6】:

    我在本地遇到了同样的错误,但它没有在 prod 中复制。 当我将连接时间从 150 毫秒增加到更大的数字(即 15000)时,它开始正常工作。

    val mdbConnectionFS = FailoverStrategy.apply(
        Duration(15000, TimeUnit.MILLISECONDS),
        5,
        FactorFun(1.0)
      )
    
    val connOptions : MongoConnectionOptions = MongoConnectionOptions.default.copy(
        keepAlive = true,
        credentials = mongoClientMap,
        failoverStrategy = mdbConnectionFS,
        nbChannelsPerNode = channels
      )
    

    【讨论】:

      猜你喜欢
      • 2020-04-01
      • 1970-01-01
      • 1970-01-01
      • 2014-06-08
      • 2014-09-23
      • 1970-01-01
      • 1970-01-01
      • 2022-11-30
      • 2014-06-24
      相关资源
      最近更新 更多