【前面的话】最近在看服务熔断的相关技术,下面就来总结一下。
壹、入围方案
- Sentinel
- Hystrix
- resilience4j
- github地址
- https://resilience4j.readme.io/docs
- 是一款轻量、简单,并且文档非常清晰、丰富的熔断工具。是Hystrix替代品,实现思路和Hystrix一致,目前持续更新中
- 需要自己对micrometer、prometheus以及Dropwizard metrics进行整合
- CircuitBreaker 熔断
- Bulkhead 隔离
- RateLimiter QPS限制
- Retry 重试
- TimeLimiter 超时限制
- Cache 缓存
- 自己实现(基于Guava)
- 基于Guava的令牌桶,可以轻松实现对QPS进行限流
贰、技术对比
|
Sentinel |
Hystrix |
resilience4j |
使用Guava实现 |
隔离策略 |
信号量隔离(并发线程数限流) |
线程池隔离/信号量隔离 |
信号量隔离 |
|
熔断降级策略 |
基于响应时间、异常比率、异常数 |
基于异常比率 |
基于异常比率、响应时间 |
|
实时统计实现 |
滑动窗口(LeapArray) |
滑动窗口(基于 RxJava) |
Ring Bit Buffer |
令牌桶 |
动态规则配置 |
支持多种数据源 |
支持多种数据源 |
有限支持 |
|
扩展性 |
多个扩展点 |
插件的形式 |
接口的形式 |
|
基于注解的支持 |
支持 |
支持 |
支持 |
支持 |
单机限流 |
基于 QPS,支持基于调用关系的限流 |
有限的支持 |
Rate Limiter |
基于 QPS |
集群流控 |
支持 |
不支持 |
不支持 |
|
流量整形 |
支持预热模式与匀速排队控制效果 |
不支持 |
简单的 Rate Limiter 模式 |
|
系统自适应保护 |
支持 |
不支持 |
不支持 |
|
热点识别/防护 |
支持 |
不支持 |
不支持 |
|
Service Mesh 支持 |
支持 Envoy/Istio |
不支持 |
不支持 |
|
控制台 |
提供开箱即用的控制台,可配置规则、实时监控、机器发现等 |
简单的监控查看 |
不提供控制台,可对接其它监控系统 |
|
是否支持默认规则 |
不支持,需要针对每个接口配置规则 |
支持 |
支持 |
|
是否支持过滤异常 |
注解单个接口支持 |
注解和全局默认配置 |
注解和全局默认配置 |
|
叁、应用改造
3.1、sentinel
3.1.1、引入依赖
1 2 3 4 5
| <dependency> <groupId>com.alibaba.cloud</groupId> <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId> <version>2.0.3.RELEASE</version> </dependency>
|
3.1.2、改造接口或者service层
@SentinelResource(value = “allInfos”,fallback = “errorReturn”)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| @Target({ElementType.METHOD, ElementType.TYPE}) @Retention(RetentionPolicy.RUNTIME) @Inherited public @interface SentinelResource { String value() default "";
EntryType entryType() default EntryType.OUT;
int resourceType() default 0;
String blockHandler() default "";
Class<?>[] blockHandlerClass() default {};
String fallback() default "";
String defaultFallback() default "";
Class<?>[] fallbackClass() default {};
Class<? extends Throwable>[] exceptionsToTrace() default {Throwable.class};
Class<? extends Throwable>[] exceptionsToIgnore() default {}; }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| @RequestMapping("/get") @ResponseBody @SentinelResource(value = "allInfos",fallback = "errorReturn") public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){ try { if (num % 2 == 0) { log.info("num % 2 == 0"); throw new BaseException("something bad with 2", 400); } return JsonResult.ok(); } catch (ProgramException e) { log.info("error"); return JsonResult.error("error"); } }
|
3.1.3、针对接口配置熔断方法或者限流方法
默认过滤拦截所有Controller接口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
public JsonResult errorReturn(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num) throws BlockException { return JsonResult.error("error 限流" + num ); }
public JsonResult errorReturn(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num,BlockException b) throws BlockException { return JsonResult.error("error 熔断" + num ); }
|
注意也可以不配置限流或者熔断方法。通过全局异常去捕获UndeclaredThrowableException或者BlockException避免大量的开发量
3.1.4、接入dashboard
1 2 3 4 5 6
| spring: cloud: sentinel: transport: port: 8719 dashboard: localhost:8080
|
3.1.5、规则持久化和动态更新
接入配置中心如:zookeeper等等,并对规则采用推模式
3.2、hystrix
3.2.1、引入依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId> <version>2.0.4.RELEASE</version> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-hystrix</artifactId> <version>2.0.4.RELEASE</version> </dependency>
|
3.2.2、改造接口
@HystrixCommand(fallbackMethod = “timeOutError”)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| @Target({ElementType.METHOD}) @Retention(RetentionPolicy.RUNTIME) @Inherited @Documented public @interface HystrixCommand { String groupKey() default "";
String commandKey() default "";
String threadPoolKey() default "";
String fallbackMethod() default "";
HystrixProperty[] commandProperties() default {};
HystrixProperty[] threadPoolProperties() default {};
Class<? extends Throwable>[] ignoreExceptions() default {};
ObservableExecutionMode observableExecutionMode() default ObservableExecutionMode.EAGER;
HystrixException[] raiseHystrixExceptions() default {};
String defaultFallback() default ""; }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| @RequestMapping("/get") @ResponseBody @HystrixCommand(fallbackMethod = "fallbackMethod") public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){ try { if (num % 3 == 0) { log.info("num % 3 == 0"); throw new BaseException("something bad whitch 3", 400); }
return JsonResult.ok(); } catch (ProgramException | InterruptedException exception) { log.info("error"); return JsonResult.error("error"); } }
|
3.2.3、针对接口配置熔断方法
1 2 3 4 5 6 7 8 9 10 11 12
|
public JsonResult fallbackMethod(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num) { response.setStatus(500); log.info("发生了熔断!!"); return JsonResult.error("熔断"); }
|
3.2.4、配置默认策略
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| hystrix: command: default: execution: isolation: strategy: THREAD thread: # 线程超时15秒,调用Fallback方法 timeoutInMilliseconds: 15000 metrics: rollingStats: timeInMilliseconds: 15000 circuitBreaker: # 10秒内出现3个以上请求(已临近阀值),并且出错率在50%以上,开启断路器.断开服务,调用Fallback方法 requestVolumeThreshold: 3 sleepWindowInMilliseconds: 10000
|
3.2.5、接入监控
曲线:用来记录2分钟内流量的相对变化,我们可以通过它来观察到流量的上升和下降趋势。
集群监控需要用到注册中心
3.3、resilience4j
3.3.1、引入依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency>
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency>
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-boot2</artifactId> <version>1.6.1</version> </dependency>
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-bulkhead</artifactId> <version>1.6.1</version> </dependency>
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-ratelimiter</artifactId> <version>1.6.1</version> </dependency>
<dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-timelimiter</artifactId> <version>1.6.1</version> </dependency>
|
可以按需要引入:bulkhead,ratelimiter,timelimiter等
3.3.2、改造接口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| @RequestMapping("/get") @ResponseBody
@CircuitBreaker(name = "BulkheadA",fallbackMethod = "fallbackMethod") @Bulkhead(name = "BulkheadA",fallbackMethod = "fallbackMethod") public JsonResult allInfos(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num){ log.info("param----->" + num); try {
if (num % 2 == 0) { log.info("num % 2 == 0"); throw new BaseException("something bad with 2", 400); }
if (num % 3 == 0) { log.info("num % 3 == 0"); throw new BaseException("something bad whitch 3", 400); }
if (num % 5 == 0) { log.info("num % 5 == 0"); throw new ProgramException("something bad whitch 5", 400); } if (num % 7 == 0) { log.info("num % 7 == 0"); int res = 1 / 0; } return JsonResult.ok(); } catch (BufferUnderflowException e) { log.info("error"); return JsonResult.error("error"); } }
|
3.3.3、针对接口配置熔断方法
1 2 3 4 5 6 7 8 9 10 11
|
public JsonResult fallbackMethod(HttpServletRequest request, HttpServletResponse response, @RequestParam Integer num, BulkheadFullException exception) { return JsonResult.error("error 熔断" + num ); }
|
3.3.4、配置规则
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
| resilience4j.circuitbreaker: instances: backendA: registerHealthIndicator: true slidingWindowSize: 100 backendB: registerHealthIndicator: true slidingWindowSize: 10 permittedNumberOfCallsInHalfOpenState: 3 slidingWindowType: TIME_BASED minimumNumberOfCalls: 20 waitDurationInOpenState: 50s failureRateThreshold: 50 eventConsumerBufferSize: 10 recordFailurePredicate: io.github.robwin.exception.RecordFailurePredicate
resilience4j.retry: instances: backendA: maxRetryAttempts: 3 waitDuration: 10s enableExponentialBackoff: true exponentialBackoffMultiplier: 2 retryExceptions: - org.springframework.web.client.HttpServerErrorException - java.io.IOException ignoreExceptions: - io.github.robwin.exception.BusinessException backendB: maxRetryAttempts: 3 waitDuration: 10s retryExceptions: - org.springframework.web.client.HttpServerErrorException - java.io.IOException ignoreExceptions: - io.github.robwin.exception.BusinessException
resilience4j.bulkhead: instances: backendA: maxConcurrentCalls: 10 backendB: maxWaitDuration: 10ms maxConcurrentCalls: 20
resilience4j.thread-pool-bulkhead: instances: backendC: maxThreadPoolSize: 1 coreThreadPoolSize: 1 queueCapacity: 1
resilience4j.ratelimiter: instances: backendA: limitForPeriod: 10 limitRefreshPeriod: 1s timeoutDuration: 0 registerHealthIndicator: true eventConsumerBufferSize: 100 backendB: limitForPeriod: 6 limitRefreshPeriod: 500ms timeoutDuration: 3s
resilience4j.timelimiter: instances: backendA: timeoutDuration: 2s cancelRunningFuture: true backendB: timeoutDuration: 1s cancelRunningFuture: false
|
配置的规则可以被代码覆盖
3.3.5、配置监控
如grafana等
肆、关注点
- 是否需要过滤部分异常
- 是否需要全局默认规则
- 可能需要引入其他中间件
- k8s流量控制
- 规则存储和动态修改
- 接入改造代价
【后面的话】
个人建议的话,比较推荐sentinel,它提供了很多接口便于开发者自己拓展,同时我觉得他的规则动态更新也比较方便。最后是相关示例代码:单体应用示例代码