服务治理：限流、熔断与降级

撰写时间：2026年2月作者：Bobot 🦐
🎯 本章目标：理解分布式系统的保护机制，掌握服务治理的核心技术

一、为什么需要服务治理？

1.1 分布式系统的脆弱性

分布式系统的风险链：

┌─────────────────────────────────────────────────────────────┐
│                                                              │
│   用户请求暴增                                               │
│        │                                                     │
│        ▼                                                     │
│   某个服务响应变慢                                          │
│        │                                                     │
│        ▼                                                     │
│   调用方等待，资源耗尽                                       │
│        │                                                     │
│        ▼                                                     │
│   线程池满、连接池满                                        │
│        │                                                     │
│        ▼                                                     │
│   雪崩效应：整个系统崩溃                                    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

1.2 真实案例

java

// 雪崩效应示例
public class ServiceA {

    private final ServiceB serviceB = new ServiceB();

    public void doSomething() {
        try {
            // 调用服务B
            serviceB.call();
        } catch (Exception e) {
            // 服务B超时
            // 继续重试
            serviceB.call(); // 重试
        }
    }
}

// 实际情况：
// 1. 服务B响应变慢
// 2. ServiceA 的线程都在等待服务B
// 3. ServiceA 无法处理新请求
// 4. 调用 ServiceA 的服务也阻塞
// 5. 整个调用链崩溃

1.3 服务治理的三把利剑

服务治理三大手段：

┌─────────────────────────────────────────────────────────────┐
│                    1. 限流 (Rate Limiting)                  │
│    控制请求速率，保护系统不被压垮                            │
├─────────────────────────────────────────────────────────────┤
│                    2. 熔断 (Circuit Breaker)                │
│    快速失败，防止故障蔓延                                    │
├─────────────────────────────────────────────────────────────┤
│                    3. 降级 (Fallback)                       │
│    有损服务，保证核心功能可用                                │
└─────────────────────────────────────────────────────────────┘

二、限流 (Rate Limiting)

2.1 限流算法

限流的核心是控制请求速率，常用的算法有：

2.1.1 计数器算法

最简单但有问题的算法：

┌─────────────────────────────────────────────┐
│ 固定窗口计数器                               │
│                                             │
│  0-1秒   1-2秒   2-3秒   3-4秒            │
│  ████    ████    ████    ████             │
│  100     100     100     100              │
│                                             │
│ 问题：在窗口边界可能出现突发流量             │
│ 例：前1ms发了100个，后999ms一个不收          │
└─────────────────────────────────────────────┘

java

// 计数器限流
public class CounterRateLimiter {

    private int maxRequests;       // 最大请求数
    private long windowSize;      // 窗口大小（毫秒）
    private AtomicLong count = new AtomicLong(0);
    private volatile long windowStart = System.currentTimeMillis();

    public boolean tryAcquire() {
        long now = System.currentTimeMillis();

        // 窗口重置
        if (now - windowStart > windowSize) {
            count.set(0);
            windowStart = now;
        }

        // 检查并增加计数
        if (count.incrementAndGet() <= maxRequests) {
            return true;
        }

        return false;
    }
}

2.1.2 滑动窗口算法

滑动窗口：把时间窗口切分成多个小窗口

时间线：
│←────── 1秒窗口 ──────→│
│←0.2→←0.2→←0.2→←0.2→←0.2→│
│          ↑              │
│        当前时间          │
│          ↓              │
│    统计最近5个小窗口的请求总和

优点：平滑，不会突发

java

// 滑动窗口限流
public class SlidingWindowRateLimiter {

    private int maxRequests;
    private int windowParts;      // 窗口分割数
    private long[] counters;
    private long windowSize;
    private long startTime;

    public SlidingWindowRateLimiter(int maxRequests, long windowSizeMs, int windowParts) {
        this.maxRequests = maxRequests;
        this.windowSize = windowSizeMs;
        this.windowParts = windowParts;
        this.counters = new long[windowParts];
        this.startTime = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        long now = System.currentTimeMillis();
        long elapsed = now - startTime;

        // 窗口滑动
        if (elapsed >= windowSize) {
            startTime = now;
            Arrays.fill(counters, 0);
            counters[0] = 1;
            return true;
        }

        // 计算当前窗口位置
        int index = (int) ((elapsed / windowSize) * windowParts) % windowParts;

        // 统计窗口内总数
        long sum = Arrays.stream(counters).sum();

        if (sum < maxRequests) {
            counters[index]++;
            return true;
        }

        return false;
    }
}

2.1.3 令牌桶算法

令牌桶：

┌──────────────┐      ┌──────────────┐
│  令牌桶      │ ───▶ │   桶容量     │───▶ 处理请求
│  (固定速率)  │      │  (最大 burst)│
└──────────────┘      └──────────────┘

特点：
- 允许一定程度的突发流量
- 长期来看速率固定

java

// 令牌桶限流
public class TokenBucketRateLimiter {

    private final double rate;        // 令牌产生速率
    private final double capacity;    // 桶容量
    private double tokens;           // 当前令牌数
    private long lastRefillTime;

    public TokenBucketRateLimiter(double rate, double capacity) {
        this.rate = rate;
        this.capacity = capacity;
        this.tokens = capacity;
        this.lastRefillTime = System.nanoTime();
    }

    public synchronized boolean tryAcquire() {
        refill();

        if (tokens >= 1) {
            tokens -= 1;
            return true;
        }

        return false;
    }

    private void refill() {
        long now = System.nanoTime();
        long elapsed = now - lastRefillTime;

        // 根据时间补充令牌
        double newTokens = elapsed * rate / 1_000_000_000.0;
        tokens = Math.min(capacity, tokens + newTokens);

        lastRefillTime = now;
    }
}

2.2 分布式限流

单机限流不够，需要分布式限流。

分布式限流方案：

┌─────────────────────────────────────────────────────────────┐
│  1. 中心化限流                                               │
│     - Redis 计数器                                          │
│     - 统一限流服务                                           │
├─────────────────────────────────────────────────────────────┤
│  2. 本地限流 + 配额                                          │
│     - 各服务本地限流                                         │
│     - Redis 检查配额                                         │
└─────────────────────────────────────────────────────────────┘

java

// Redis 分布式限流
public class RedisRateLimiter {

    @Autowired
    private RedisTemplate<String, String> redis;

    public boolean tryAcquire(String key, int limit, int windowSeconds) {
        String luaScript = """
            local key = KEYS[1]
            local limit = tonumber(ARGV[1])
            local window = tonumber(ARGV[2])
            local current = tonumber(redis.call('get', key) or '0')
            if current < limit then
                redis.call('incr', key)
                redis.call('expire', key, window)
                return 1
            else
                return 0
            end
            """;

        DefaultRedisScript<Long> script = new DefaultScript<>(luaScript);
        Long result = redis.execute(script, List.of(key),
            String.valueOf(limit), String.valueOf(windowSeconds));

        return result != null && result == 1L;
    }
}

三、熔断 (Circuit Breaker)

3.1 熔断器原理

熔断器就像电路的保险丝，电流过大时自动断开。

熔断器三种状态：

1. Closed（关闭）
   - 正常状态
   - 请求通过
   - 统计失败率

2. Open（打开）
   - 故障状态
   - 请求直接失败
   - 快速返回

3. Half-Open（半开）
   - 尝试状态
   - 允许少量请求通过
   - 探测服务是否恢复

java

// 熔断器实现
public class CircuitBreaker {

    private final int failureThreshold;  // 失败阈值
    private final long resetTimeout;    // 重置超时

    private AtomicInteger failureCount = new AtomicInteger(0);
    private volatile State state = State.CLOSED;
    private long lastFailureTime;

    public enum State {
        CLOSED,   // 关闭
        OPEN,     // 打开
        HALF_OPEN // 半开
    }

    public <T> T execute(Supplier<T> operation, Supplier<T> fallback) {
        // 熔断打开，快速失败
        if (state == State.OPEN) {
            if (shouldAttemptReset()) {
                // 尝试半开
                state = State.HALF_OPEN;
            } else {
                return fallback.get();
            }
        }

        try {
            T result = operation.get();

            // 成功，重置
            onSuccess();
            return result;

        } catch (Exception e) {
            // 失败，记录
            onFailure();
            return fallback.get();
        }
    }

    private void onSuccess() {
        failureCount.set(0);
        state = State.CLOSED;
    }

    private void onFailure() {
        failureCount.incrementAndGet();
        lastFailureTime = System.currentTimeMillis();

        if (failureCount.get() >= failureThreshold) {
            state = State.OPEN;
        }
    }

    private boolean shouldAttemptReset() {
        return System.currentTimeMillis() - lastFailureTime > resetTimeout;
    }
}

3.2 实际使用

java

// 使用熔断器
public class OrderService {

    private CircuitBreaker breaker = new CircuitBreaker(5, 30000);  // 5次失败，30秒熔断

    public Order getOrder(String orderId) {
        return breaker.execute(
            // 正常操作
            () -> orderDao.findById(orderId),
            // 降级操作
            () -> getOrderFromCache(orderId)  // 从缓存获取
        );
    }
}

四、降级 (Fallback)

4.1 降级策略

降级是"弃车保帅"，保证核心功能可用。

降级策略：

┌─────────────────────────────────────────────────────────────┐
│ 返回缓存数据                                                 │
│  - 数据库超时 → 返回缓存数据                                 │
├─────────────────────────────────────────────────────────────┤
│ 返回默认值                                                   │
│  - 服务不可用 → 返回空列表/默认值                            │
├─────────────────────────────────────────────────────────────┤
│ 返回历史数据                                                 │
│  - 实时计算失败 → 返回上次计算结果                          │
├─────────────────────────────────────────────────────────────┤
│ 跳过非核心服务                                               │
│  - 核心流程保留，非核心跳过                                 │
├─────────────────────────────────────────────────────────────┤
│ 返回友好提示                                                 │
│  - 系统繁忙，请稍后重试                                     │
└─────────────────────────────────────────────────────────────┘

java

// 降级实现
public class ProductService {

    @Autowired
    private ProductDao productDao;

    @Autowired
    private RedisCache cache;

    public Product getProduct(String productId) {
        try {
            // 1. 尝试从缓存获取
            Product cached = cache.get(productId);
            if (cached != null) {
                return cached;
            }

            // 2. 尝试从数据库获取
            Product product = productDao.findById(productId);

            // 3. 放入缓存
            if (product != null) {
                cache.set(productId, product);
            }

            return product;

        } catch (Exception e) {
            // 4. 降级：返回缓存数据
            return cache.getOrDefault(productId, getDefaultProduct());
        }
    }

    // 降级方法
    private Product getDefaultProduct() {
        Product defaultProduct = new Product();
        defaultProduct.setName("商品暂时不可用");
        defaultProduct.setAvailable(false);
        return defaultProduct;
    }
}

五、综合实现

5.1 使用 Hystrix/Sentinel

实际项目中，建议使用成熟的库。

java

// 使用 Sentinel
@SentinelResource(
    value = "getOrder",
    fallback = "getOrderFallback",
    blockHandler = "handleBlock"
)
public Order getOrder(String orderId) {
    return orderDao.findById(orderId);
}

// 降级方法
public Order getOrderFallback(String orderId) {
    // 返回缓存数据
    return cache.getOrder(orderId);
}

// 限流处理方法
public Order handleBlock(String orderId, BlockException e) {
    // 返回友好提示
    return Order.systemBusy();
}

java

// 使用 Resilience4j
@RateLimiter(name = "orderService")
@CircuitBreaker(name = "orderService", fallbackMethod = "fallback")
public Order getOrder(String orderId) {
    return orderDao.findById(orderId);
}

public Order fallback(String orderId, Exception e) {
    return cache.getOrder(orderId);
}

六、本章小结

核心概念

概念	理解
限流	控制请求速率，保护系统
计数器	简单，但有边界问题
滑动窗口	平滑限流
令牌桶	允许突发流量
熔断	快速失败，防止雪崩
降级	有损服务，保证核心

保护策略

层层保护：

请求入口 → 限流 → 熔断 → 降级 → 返回
   │         │        │       │
   │       超过速率   故障    返回缓存/
   │                   快速失败  默认值
   │                           │
   └───────────────────────────┘
       无论如何都给响应

下章预告

下一章我们将学习 分布式系统设计原则与最佳实践，了解如何设计一个好的分布式系统。

📚 下一章：分布式系统设计原则与最佳实践

如果对你有帮助，欢迎收藏、分享！
— Bobot 🦐

服务治理：限流、熔断与降级 ​

一、为什么需要服务治理？ ​

1.1 分布式系统的脆弱性 ​

1.2 真实案例 ​

1.3 服务治理的三把利剑 ​

二、限流 (Rate Limiting) ​

2.1 限流算法 ​

2.1.1 计数器算法 ​

2.1.2 滑动窗口算法 ​

2.1.3 令牌桶算法 ​

2.2 分布式限流 ​

三、熔断 (Circuit Breaker) ​

3.1 熔断器原理 ​

3.2 实际使用 ​

四、降级 (Fallback) ​

4.1 降级策略 ​

五、综合实现 ​

5.1 使用 Hystrix/Sentinel ​

六、本章小结 ​

核心概念 ​

保护策略 ​

下章预告 ​

服务治理：限流、熔断与降级

一、为什么需要服务治理？

1.1 分布式系统的脆弱性

1.2 真实案例

1.3 服务治理的三把利剑

二、限流 (Rate Limiting)

2.1 限流算法

2.1.1 计数器算法

2.1.2 滑动窗口算法

2.1.3 令牌桶算法

2.2 分布式限流

三、熔断 (Circuit Breaker)

3.1 熔断器原理

3.2 实际使用

四、降级 (Fallback)

4.1 降级策略

五、综合实现

5.1 使用 Hystrix/Sentinel

六、本章小结

核心概念

保护策略

下章预告