Adding retry policies in your software is an easy way to increase resiliency. This is especially useful when making HTTP requests or doing anything else that has to reach out across the network.
If at first you don’t succeed, try, try again. In go code, that translates to:
retry function recursively calls itself, counting down attempts and sleeping for twice as long each time (i.e. exponential backoff). This technique works well until the situation arises where a good number of clients start their retry loops at roughly the same time. This could happen if a lot of connections get dropped at once. The retry attempts would then be in sync with each other, creating what is known as the Thundering Herd problem. To prevent this, we can add some randomness by inserting the following lines before we call
jitter := time.Duration(rand.Int63n(int64(sleep))) sleep = sleep + jitter/2
The improved, jittery version:
There are two options for stopping the retry loop before all the attempts are made:
Choose option #2 when an error occurs where retrying would be futile. Consider most
4XX HTTP status codes. They indicate that the client has done something wrong and subsequent retries, without any modification to the request will result in the same response. In this case we still want to return an error so we wrap the error in the
stop type. The actual error that is returned by the retry function will be the original non-wrapped error. This allows for later checks like
err == ErrUnauthorized.
Take a look at the following implementation for retrying a HTTP request. Note: In this case, there are only 2 additional lines needed for adding in the retry policy to the existing
DeleteThing function (lines 14 and 34).
[June 1, 2017] Edited to add the jitter example