High Performance Rate Limiting MicroService and Library
Gubernator is a distributed, high performance, cloud native and stateless rate limiting service.
Gubernator is stateless in that it doesn’t require disk space to operate. No configuration or cache data is ever synced to disk. This is because every request to gubernator includes the config for the rate limit. At first you might think this an unnecessary overhead to each request. However, In reality a rate limit config is made up of only 4, 64bit integers.
An example rate limit request sent via GRPC might look like the following
yaml rate_limits: # Scopes the request to a specific rate limit - name: requests_per_sec # A unique_key that identifies this instance of a rate limit request unique_key: account_id=123|source_ip=172.0.0.1 # The number of hits we are requesting hits: 1 # The total number of requests allowed for this rate limit limit: 100 # The duration of the rate limit in milliseconds duration: 1000 # The algorithm used to calculate the rate limit # 0 = Token Bucket # 1 = Leaky Bucket algorithm: 0 # The behavior of the rate limit in gubernator. # 0 = BATCHING (Enables batching of requests to peers) # 1 = NO_BATCHING (Disables batching) # 2 = GLOBAL (Enable global caching for this rate limit) behavior: 0
An example response would be
rate_limits: # The status of the rate limit. OK = 0, OVER_LIMIT = 1 - status: 0, # The current configured limit limit: 10, # The number of requests remaining remaining: 7, # A unix timestamp in milliseconds of when the bucket will reset, or if # OVER_LIMIT is set it is the time at which the rate limit will no # longer return OVER_LIMIT. reset_time: 1551309219226, # Additional metadata about the request the client might find useful metadata: # This is the name of the coordinator that rate limited this request "owner": "api-n03.staging.us-east-1.mailgun.org:9041"
Gubernator currently supports 2 rate limit algorithms.
Token Bucket implementation starts with an empty bucket, then each
Hitadds a token to the bucket until the bucket is full. Once the bucket is full, requests will return
OVER_LIMITuntil the
reset_timeis reached at which point the bucket is emptied and requests will return
UNDER_LIMIT. This algorithm is useful for enforcing very bursty limits. (IE: Applications where a single request can add more than 1
hitto the bucket; or non network based queuing systems.) The downside to this implementation is that once you have hit the limit no more requests are allowed until the configured rate limit duration resets the bucket to zero.
Leaky Bucket is implemented similarly to Token Bucket where
OVER_LIMITis returned when the bucket is full. However tokens leak from the bucket at a consistent rate which is calculated as
duration / limit. This algorithm is useful for metering, as the bucket leaks allowing traffic to continue without the need to wait for the configured rate limit duration to reset the bucket to zero.
In our production environment, for every request to our API we send 2 rate limit requests to gubernator for rate limit evaluation, one to rate the HTTP request and the other is to rate the number of recipients a user can send an email too within the specific duration. Under this setup a single gubernator node fields over 2,000 requests a second with most batched responses returned in under 1 millisecond.
Peer requests forwarded to owning nodes typically respond in under 30 microseconds.
NOTE The above graphs only report the slowest request within the 1 second sample time. So you are seeing the slowest requests that gubernator fields to clients.
Gubernator allows users to choose non-batching behavior which would further reduce latency for client rate limit requests. However because of throughput requirements our production environment uses Behaviour=BATCHING with the default 500 microsecond window. In production we have observed batch sizes of 1,000 during peak API usage. Other users who don’t have the same high traffic demands could disable batching and would see lower latencies but at the cost of throughput.
Users may choose a behavior called
DURATION_IS_GREGORIANwhich changes the behavior of the
Durationfield. When
Behavioris set to
DURATION_IS_GREGORIANthe
Durationof the rate limit is reset whenever the end of selected gregorian calendar interval is reached.
This is useful when you want to impose daily or monthly limits on a resource. Using this behavior you know when the end of the day or month is reached the limit on the resource is reset regardless of when the first rate limit request was received by Gubernator.
Given the following
Durationvalues * 0 = Minutes * 1 = Hours * 2 = Days * 3 = Weeks * 4 = Months * 5 = Years
Examples when using
Behavior = DURATION_IS_GREGORIAN* If
Duration = 2(Days) then the rate limit will reset to
Current = 0at the end of the current day the rate limit was created. * If
Duration = 0(Minutes) then the rate limit will reset to
Current = 0at the end of the minute the rate limit was created. * If
Duration = 4(Months) then the rate limit will reset to
Current = 0at the end of the month the rate limit was created.
If you are using golang, you can use Gubernator as a library. This is useful if you wish to implement a rate limit service with your own company specific model on top. We do this internally here at mailgun with a service we creatively called
ratelimitswhich keeps track of the limits imposed on a per account basis. In this way you can utilize the power and speed of Gubernator but still layer business logic and integrate domain specific problems into your rate limiting service.
When you use the library, your service becomes a full member of the cluster participating in the same consistent hashing and caching as a stand alone Gubernator server would. All you need to do is provide the GRPC server instance and tell Gubernator where the peers in your cluster are located. The
cmd/gubernator/main.gois a great example of how to use Gubernator as a library.
While the Gubernator server currently doesn't directly support disk persistence, the Gubernator library does provide interfaces through which library users can implement persistence. The Gubernator library has two interfaces available for disk persistence. Depending on the use case an implementor can implement the Loader interface and only support persistence of rate limits at startup and shutdown, or users can implement the Store interface and Gubernator will continuously call
OnChange()and
Get()to keep the in memory cache and persistent store up to date with the latest rate limit data. Both interfaces can be implemented simultaneously to ensure data is always saved to persistent storage.
For those who choose to implement the
Storeinterface, it is not required to store ALL the rate limits received via
OnChange(). For instance; If you wish to support rate limit durations longer than a minute, day or month, calls to
OnChange()can check the duration of a rate limit and decide to only persist those rate limits that have durations over a self determined limit.
All methods are accessed via GRPC but are also exposed via HTTP using the GRPC Gateway
Health check returns
unhealthyin the event a peer is reported by etcd or kubernetes as
upbut the server instance is unable to contact that peer via it's advertised address.
rpc HealthCheck (HealthCheckReq) returns (HealthCheckResp)
GET /v1/HealthCheck
Example response:
{ "status": "healthy", "peer_count": 3 }
Rate limits can be applied or retrieved using this interface. If the client makes a request to the server with
hits: 0then current state of the rate limit is retrieved but not incremented.
rpc GetRateLimits (GetRateLimitsReq) returns (GetRateLimitsResp)
POST /v1/GetRateLimits
Example Payload
json { "requests":[ { "name": "requests_per_sec", "unique_key": "account.id=1234", "hits": 1, "duration": 60000, "limit": 10 } ] }
Example response:
{ "responses":[ { "status": 0, "limit": "10", "remaining": "7", "reset_time": "1551309219226" } ] }
NOTE: Gubernator uses etcd or kubernetes to discover peers and establish a cluster. If you don't have either, the docker-compose method is the simplest way to try gubernator out.
$ docker run -p 8081:81 -p 9080:80 -e GUBER_ETCD_ENDPOINTS=etcd1:2379,etcd2:2379 \ thrawn01/gubernator:latestHit the HTTP API at localhost:9080
The docker compose file uses member-list for peer discovery ```bash
$ curl -O https://raw.githubusercontent.com/mailgun/gubernator/master/docker-compose.yaml
$ vi docker-compose.yaml
$ docker-compose up -d
$ curl http://localhost:9080/v1/HealthCheck ```
# Download the kubernetes deployment spec $ curl -O https://raw.githubusercontent.com/mailgun/gubernator/master/k8s-deployment.yamlEdit the deployment file to change the environment config variables
$ vi k8s-deployment.yaml
Create the deployment (includes headless service spec)
$ kubectl create -f k8s-deployment.yaml
Gubernator supports TLS for both HTTP and GRPC connections. You can see an example with self signed certs by running
docker-compose-tls.yaml```bash
$ docker-compose -f docker-compose-tls.yaml up -d
$ curl --cacert certs/ca.pem --cert certs/gubernator.pem --key certs/gubernator.key https://localhost:9080/v1/HealthCheck ```
Gubernator is configured via environment variables with an optional
--configflag which takes a file of key/values and places them into the local environment before startup.
See the
example.conffor all available config options and their descriptions.
See architecture.md for a full description of the architecture and the inner workings of gubernator.