Simple Golang Retry Function

Nick Stogner | May 2017

Adding retry policies in your software is an easy way to increase resiliency. This is especially useful when making HTTP requests or doing anything else that has to reach out across the network.

If at first you don’t succeed, try, try again. In go code, that translates to:

	func retry(attempts int, sleep time.Duration, fn func() error) error {
	if err := fn(); err != nil {
	if s, ok := err.(stop); ok {
	// Return the original error for later checking
	return s.error
	}

	if attempts--; attempts > 0 {
	time.Sleep(sleep)
	return retry(attempts, 2*sleep, fn)
	}
	return err
	}
	return nil
	}

	type stop struct {
	error
	}

view raw retry.go hosted with ❤ by GitHub

The retry function recursively calls itself, counting down attempts and sleeping for twice as long each time (i.e. exponential backoff). This technique works well until the situation arises where a good number of clients start their retry loops at roughly the same time. This could happen if a lot of connections get dropped at once. The retry attempts would then be in sync with each other, creating what is known as the Thundering Herd problem. To prevent this, we can add some randomness by inserting the following lines before we call time.Sleep:

jitter := time.Duration(rand.Int63n(int64(sleep)))
sleep = sleep + jitter/2

The improved, jittery version:

	func init() {
	rand.Seed(time.Now().UnixNano())
	}

	func retry(attempts int, sleep time.Duration, f func() error) error {
	if err := f(); err != nil {
	if s, ok := err.(stop); ok {
	// Return the original error for later checking
	return s.error
	}

	if attempts--; attempts > 0 {
	// Add some randomness to prevent creating a Thundering Herd
	jitter := time.Duration(rand.Int63n(int64(sleep)))
	sleep = sleep + jitter/2

	time.Sleep(sleep)
	return retry(attempts, 2*sleep, f)
	}
	return err
	}

	return nil
	}

	type stop struct {
	error
	}

view raw retry_jitter.go hosted with ❤ by GitHub

There are two options for stopping the retry loop before all the attempts are made:

Return nil
Return a wrapped error: stop{err}

Choose option #2 when an error occurs where retrying would be futile. Consider most 4XX HTTP status codes. They indicate that the client has done something wrong and subsequent retries, without any modification to the request will result in the same response. In this case we still want to return an error so we wrap the error in the stop type. The actual error that is returned by the retry function will be the original non-wrapped error. This allows for later checks like err == ErrUnauthorized.

Take a look at the following implementation for retrying a HTTP request. Note: In this case, there are only 2 additional lines needed for adding in the retry policy to the existing DeleteThing function (lines 14 and 34).

	// DeleteThing attempts to delete a thing. It will try a maximum of three times.
	func DeleteThing(id string) error {
	// Build the request
	req, err := http.NewRequest(
	"DELETE",
	fmt.Sprintf("https://unreliable-api/things/%s", id),
	nil,
	)
	if err != nil {
	return fmt.Errorf("unable to make request: %s", err)
	}

	// Execute the request
	return retry(3, time.Second, func() error {
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
	// This error will result in a retry
	return err
	}
	defer resp.Body.Close()

	s := resp.StatusCode
	switch {
	case s >= 500:
	// Retry
	return fmt.Errorf("server error: %v", s)
	case s >= 400:
	// Don't retry, it was client's fault
	return stop{fmt.Errorf("client error: %v", s)}
	default:
	// Happy
	return nil
	}
	})
	}

view raw retry_http.go hosted with ❤ by GitHub

Notes

[June 1, 2017] Edited to add the jitter example