In today’s modern app development, many applications are distributed with multiple instances running in parallel. While there are numerous benefits to this, there is also greater complexity in handling things that previously were a lot simpler - one of those things is rate limiting. In this blog post, we will look closer at what rate limiting is, some real-world applications, and how we can implement this in a distributed environment.
Why distributed rate limiting?
Recently, I was tasked with building an integration platform for a client. The platform had a few downstream dependencies with very strict rate limiting policies. This meant that when many events occurred, there was a high risk of overwhelming these dependencies with numerous API calls. Now, typically, it’s possible to mitigate this by adding a Polly Rate Limiter strategy , but since this platform was built on Azure Integration Services using Azure Functions, the scale-out would cause a lot of unnecessary API calls.
Distributed rate limiting solves this by turning the limit into a shared contract: every instance must obtain a token from a central store (Redis) before it can call the API. If no token is available, the instance must wait to obtain one.
Picking the right algorithm
When talking about rate limiting, it’s important to understand the different algorithms available. In the table below is a quick overview of the algorithms and their use cases, which we will cover in this post.
Algorithm | Best for | Handles bursts? | Memory | Complexity |
---|---|---|---|---|
Fixed Window | Simple, predictable caps | Poor | Low | ★☆☆ |
Sliding Window | Evenly spread traffic | Good | Medium | ★★☆ |
Token Bucket | Bursts and sustained flow | Great | Low | ★★★ |
Distributed Semaphore | Max concurrent actions | N/A (concurrency) | Low | ★☆☆ |
Algorithms in detail
Fixed Window
This algorithm uses fixed intervals to determine how many tokens are available. In each window, a number of tokens are available for use. When all tokens have been used, usage is closed down until the window closes. The available token count resets in every window.
This strategy is fairly straightforward to implement, as all it does is track a count and an expiration time. When the expiration time is reached, the count is reset. This makes it a good candidate for use cases where the consumption is pretty linear, meaning token use is spread out over the rate limiting window. This also highlights the biggest drawback of this strategy: bursts. With a large window and token size, the strategy may allow for a lot of tokens to be used at once, causing spikes in uses.
Use when: Limits are in intervals, and bursts are acceptable.
|
|
It’s effectively the same as Redis’s official INCR‑based fixed‑window rate‑limiter pattern , so you can rely on well‑tested behavior.
Sliding Window
In contrast to the fixed window strategy we now keep track of when a token was consumed and make it available after the window has passed. This helps spread out the token usage to prevent spikes and is more flexible in where it can be applied. One of the drawbacks is the increased complexity of the algorithm, as it requires storing more data.
Use when: You must smooth traffic and avoid spikes.
|
|
Token Bucket
The easiest way to understand this algorithm is to imagine a bucket of tokens, as the name implies. The bucket starts full and each action consumes one or more tokens. The bucket slowly refills at a constant rate. This strategy allows for bursts if needed, but also helps spread out tokens after a burst period. The algorithm is very flexible as there is a few parameters to tweak to get the desired rate limit.
Use when: You need burst tolerance and a steady refill rate.
|
|
Distributed Semaphore
This is a strategy that is a bit different from the other strategies we have covered so far, but it does have its use in rate limiting. This strategy limits the number of concurrent actions at a given time. This is particularly useful if you want to limit how many applications access a database or an API at one time. A caller can obtain a token from the semaphore, perform an action, and then release the token when finished. This allows the next caller to get the token.
Use when: You must cap concurrency (e.g. max 10 simultaneous requests).
|
|
Conclusion
Distributed rate limiting isn’t about picking a “best” algorithm - it’s about matching strategy to workload:
- Fixed Window – trivial to code, fine for low‑variance traffic.
- Sliding Window – smooths spikes when fairness matters.
- Token Bucket – mixes bursts with sustained flow; most versatile.
- Semaphore – caps pure concurrency where “one‑in, one‑out” is vital.
By utilizing Redis and wrapping the logic in Lua, we guarantee atomicity across every function instance, container, or worker that scales out (Redis Lua scripting guarantees atomic execution ). Pair that with sensible TTLs, key namespacing and you have a rate‑limit implementation that’s:
- Centralised – one contract, many callers.
- Cloud‑ready – drop‑in for Azure Functions, Kubernetes jobs, or anything stateless.
I hope the walkthrough was useful. If you’d like to see these scripts packaged in a NuGet library or hosted in a GitHub repo, just let me know in the comments!