服务器关闭时负载平衡失败答案

【问题标题】：Loadbalancing fails when a server is down服务器关闭时负载平衡失败
【发布时间】：2017-10-25 08:59:58
【问题描述】：

我写了一组简单的微服务，架构如下：

总而言之，我添加了spring-boot-starter-actuator 以便添加/health 端点。

在 Zuul/Ribbon 配置中我添加了：

zuul:
  ignoredServices: "*"
  routes:
    home-service:
      path: /service/**
      serviceId: home-service
      retryable: true

home-service:
  ribbon:
    listOfServers: localhost:8080,localhost:8081
    eureka.enabled: false
    ServerListRefreshInterval: 1

因此，每次客户端调用GET http://localhost:7070/service/home时，负载均衡器将选择运行在 8080 或 8081 端口上的两个 HomeService 之一，并调用其端点 /home。

但是，当 HomeService 之一关闭时，负载均衡器似乎不知道（尽管有 ServerListRefreshInterval 配置），如果它尝试调用关闭实例，则会失败并显示 error=500。

我该如何解决？

【问题讨论】：

我在 spring-cloud-netflix github 存储库中发布了相同的问题（作为问题）：github.com/spring-cloud/spring-cloud-netflix/issues/1984
你有静态主机，所以我怀疑这会起作用。 AFAIK 仅在您使用服务发现时才有效，但由于您的路由是静态的，它将始终在相同的 2 个实例上运行（无论服务的状态如何）。 Zuul 用于代理而不是服务发现。
同意。目标不是将 zuul 用作发现服务，而是在与 zuul 相同的网关中使用功能区进行负载平衡。一个，不使用任何发现服务（如 consul 或 eureka）

标签： spring load-balancing netflix-zuul netflix-ribbon

【解决方案1】：

我收到并测试了来自spring-cloud team 的解决方案。

解决方案是here in github

总结一下：

我已将 org.springframework.retry.spring-retry 添加到我的 zuul 类路径中
我已将 @EnableRetry 添加到我的 zuul 应用程序中
我已将以下属性放入我的 zuul 配置中

application.yml

server:
  port: ${PORT:7070}

spring:
  application:
    name: gateway

endpoints:
  health:
    enabled: true
    sensitive: true
  restart:
    enabled: true
  shutdown:
    enabled: true

zuul:
  ignoredServices: "*"
  routes:
    home-service:
      path: /service/**
      serviceId: home-service
      retryable: true
  retryable: true

home-service:
  ribbon:
    listOfServers: localhost:8080,localhost:8081
    eureka.enabled: false
    ServerListRefreshInterval: 100
    retryableStatusCodes: 500
    MaxAutoRetries: 2
    MaxAutoRetriesNextServer: 1
    OkToRetryOnAllOperations: true
    ReadTimeout: 10000
    ConnectTimeout: 10000
    EnablePrimeConnections: true

ribbon:
  eureka:
    enabled: false

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 30000

【讨论】：

【解决方案2】：

考虑到单独的路由有 3 个级别（Zuul→Hystrix→Ribbon），调试超时可能很棘手，不包括异步执行层和重试引擎。以下方案适用于 Spring Cloud 版本 Camden.SR6 和更新版本（我在 Dalston.SR1 上检查过）：

Zuul 通过RibbonRoutingFilter 路由请求，它使用请求上下文创建一个功能区命令。然后 Ribbon 命令创建一个 LoadBalancer 命令，该命令使用 spring-retry 执行命令，根据 Zuul 设置为RetryTemplate 选择重试策略。 @EnableRetry 在这种情况下什么都不做，因为这个注解允许在重试代理时使用 @Retryable 注解包装方法。

这意味着，您的命令持续时间仅限于这两者中的较小值（请参阅this post）：

[HystrixTimeout]，这是调用 Hystrix 命令的超时时间
[RibbonTimeout * MaxAutoRetries * MaxAutoRetriesNextServer]（仅当 Zuul 在其配置中启用重试时才会启动），其中 http 客户端上的 [RibbonTimeout = ConnectTimeout + ReadTimeout]。

为了调试，在RetryableRibbonLoadBalancingHttpClient#executeWithRetry 或RetryableRibbonLoadBalancingHttpClient#execute 方法中创建断点很方便。此时，您有：

带有请求上下文的ContextAwareRequest 实例（例如RibbonApacheHttpRequest 或OkHttpRibbonRequest），其中包含Zuul 的retryable 属性；
LoadBalancedRetryPolicy intsance 带有负载均衡器上下文，其中包含 Ribbon 的 maxAutoRetries、maxAutoRetriesNextServer 和 okToRetryOnAllOperations 属性；
带有 requestConfig 的 RetryCallback 实例，其中包含 HttpClient 的 connectTimeout 和 socketTimeout 属性；
RetryTemplate 具有所选重试策略的实例。

如果没有命中断点，则说明org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient bean 没有被实例化。当 spring-retry 库不在类路径中时会发生这种情况。

【讨论】：