Resilient Microservices — Retry Pattern

3 min readSep 27, 2020

The micro-services are independent, but that does not guarantee it against any fault tolerance. The faults introduced by any external services can degrade the performance of the microservices which are dependent on the external service. It would introduce a rippling effect of performance degradation on a system of microservices.

In order to avoid that scenario, we need to design the micro service to be fault resilient and immune to such faults. Lets us assume AccountService is dependant on external service LocationService.LocationService is currently experiencing a performance degrade.

Assume LocationService returns HTTP 503.HTTP 503 means it is ServiceUnavailable. As per standard approach Location Service itself can inform the caller to wait for sometime before Retrying.

By setting Retry-After in Header :

https://tools.ietf.org/html/rfc7231#section-7.1.3

Servers send the "Retry-After" header field to indicate how long the
   user agent ought to wait before making a follow-up request.  When
   sent with a 503 (Service Unavailable) response, Retry-After indicates
   how long the service is expected to be unavailable to the client.
   When sent with any 3xx (Redirection) response, Retry-After indicates
   the minimum time that the user agent is asked to wait before issuing
   the redirected request.   The value of this field can be either an HTTP-date or a number of
   seconds to delay after the response is received.     Retry-After = HTTP-date / delay-seconds   A delay-seconds value is a non-negative decimal integer, representing
   time in seconds.     delay-seconds  = 1*DIGIT

This approach kind of adds order to the Retry approach and avoids a scenario of caller jamming multiple requests to the LocationService.Thus providing an opportunity to the LocationService to recover faster.

We need also look at this problem from Caller perspective. We need to handle Retry in AccountService.Lets us see how.

AccountService would try with a fixed BackOff time from the RetryAfter header. It would try three times and if the LocationService still fails after 3 attempts it is considered a failure and it would not retry again.

.retryWhen(Retry.onlyIf(this::is5xxServerError)
        .fixedBackoff(Duration.ofSeconds(10))
        .retryMax(3))

Assume that if LocationService did not implement the RetryAfter header. In this, it would wise to try the increase BackOff time after subsequent attempts. This would give a better chance for the LocationService to recover.

The general algorithm for an Exponential BackOff strategy is as follows :

Identity the maximum retries

2. Retry and see if the call succeeds if yes inform the caller. Otherwise, increment the retry counter

3. Retry again if it fails then increase the waiting time period between retry

4. If the maximum retry is reached inform the caller that service is unavailable

We would try to implement the ExponentialBackOff in reactor-extra as follows :

.retryWhen(Retry.onlyIf(this::is5xxServerError).exponentialBackoff(ofSeconds(1), ofSeconds(10)).retryMax(3))

In this Retry pattern, the service is still faulting for a longer period of time. With the Retry pattern, we would be still Retry and exhausting the Retry limit. We have a pattern called Circuit Breaker which can handle longer-lasting faults. I will detail that in another article. Till then Have a good week Ahead!.

Resilient Microservices — Retry Pattern

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Badri Narayanan Sugavanam

No responses yet