Simple Golang Retry Function

Nick Stogner | May 2017

Adding retry policies in your software is an easy way to increase resiliency. This is especially useful when making HTTP requests or doing anything else that has to reach out across the network.

If at first you don’t succeed, try, try again. In go code, that translates to:

The retry function recursively calls itself, counting down attempts and sleeping for twice as long each time (i.e. exponential backoff). This technique works well until the situation arises where a good number of clients start their retry loops at roughly the same time. This could happen if a lot of connections get dropped at once. The retry attempts would then be in sync with each other, creating what is known as the Thundering Herd problem. To prevent this, we can add some randomness by inserting the following lines before we call time.Sleep:

jitter := time.Duration(rand.Int63n(int64(sleep)))
sleep = sleep + jitter/2

The improved, jittery version:

There are two options for stopping the retry loop before all the attempts are made:

  1. Return nil
  2. Return a wrapped error: stop{err}

Choose option #2 when an error occurs where retrying would be futile. Consider most 4XX HTTP status codes. They indicate that the client has done something wrong and subsequent retries, without any modification to the request will result in the same response. In this case we still want to return an error so we wrap the error in the stop type. The actual error that is returned by the retry function will be the original non-wrapped error. This allows for later checks like err == ErrUnauthorized.

Take a look at the following implementation for retrying a HTTP request. Note: In this case, there are only 2 additional lines needed for adding in the retry policy to the existing DeleteThing function (lines 14 and 34).

At upgear we found this function to be so useful that we included a variation of it in our go kit on github.


[June 1, 2017] Edited to add the jitter example