Resilient Microservices — Retry Pattern

Badri Narayanan Sugavanam
3 min readSep 27, 2020

--

The micro-services are independent, but that does not guarantee it against any fault tolerance. The faults introduced by any external services can degrade the performance of the microservices which are dependent on the external service. It would introduce a rippling effect of performance degradation on a system of microservices.

In order to avoid that scenario, we need to design the micro service to be fault resilient and immune to such faults. Lets us assume AccountService is dependant on external service LocationService.LocationService is currently experiencing a performance degrade.

Assume LocationService returns HTTP 503.HTTP 503 means it is ServiceUnavailable. As per standard approach Location Service itself can inform the caller to wait for sometime before Retrying.

By setting Retry-After in Header :

https://tools.ietf.org/html/rfc7231#section-7.1.3

Servers send the "Retry-After" header field to indicate how long the
user agent ought to wait before making a follow-up request. When
sent with a 503 (Service Unavailable) response, Retry-After indicates
how long the service is expected to be unavailable to the client.
When sent with any 3xx (Redirection) response, Retry-After indicates
the minimum time that the user agent is asked to wait before issuing
the redirected request.
The value of this field can be either an HTTP-date or a number of
seconds to delay after the response is received.
Retry-After = HTTP-date / delay-seconds A delay-seconds value is a non-negative decimal integer, representing
time in seconds.
delay-seconds = 1*DIGIT

This approach kind of adds order to the Retry approach and avoids a scenario of caller jamming multiple requests to the LocationService.Thus providing an opportunity to the LocationService to recover faster.

We need also look at this problem from Caller perspective. We need to handle Retry in AccountService.Lets us see how.

AccountService would try with a fixed BackOff time from the RetryAfter header. It would try three times and if the LocationService still fails after 3 attempts it is considered a failure and it would not retry again.

.retryWhen(Retry.onlyIf(this::is5xxServerError)
.fixedBackoff(Duration.ofSeconds(10))
.retryMax(3))

Assume that if LocationService did not implement the RetryAfter header. In this, it would wise to try the increase BackOff time after subsequent attempts. This would give a better chance for the LocationService to recover.

The general algorithm for an Exponential BackOff strategy is as follows :

  1. Identity the maximum retries

2. Retry and see if the call succeeds if yes inform the caller. Otherwise, increment the retry counter

3. Retry again if it fails then increase the waiting time period between retry

4. If the maximum retry is reached inform the caller that service is unavailable

We would try to implement the ExponentialBackOff in reactor-extra as follows :

.retryWhen(Retry.onlyIf(this::is5xxServerError).exponentialBackoff(ofSeconds(1), ofSeconds(10)).retryMax(3))

In this Retry pattern, the service is still faulting for a longer period of time. With the Retry pattern, we would be still Retry and exhausting the Retry limit. We have a pattern called Circuit Breaker which can handle longer-lasting faults. I will detail that in another article. Till then Have a good week Ahead!.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response