HTTP in Swift, Part 12: Retrying

Part 12 in a series on building a Swift HTTP framework:

Most networking libraries have the ability to automatically send a request again if the received response wasn’t quite what the client was looking for. Let’s add that to our library as well.

The Setup

Recall the API of our HTTPLoader:

open class HTTPLoader {

    /// Load an HTTPTask that will eventually call its completion handler
    func load(task: HTTPTask)

    /// Reset the loader to its initial configuration
    func reset(with group: DispatchGroup)
}

Remember that this defines a way to load tasks and a way to “start over”. Loading a task has the contract that a request will eventually come back with a response.

There’s nothing in the API contract that says a request can only be loaded once. You can (and we will) have a loader that executes a single request multiple times and then picks the best response to pass back out. That’s what our loader will do.

Retrying requests is something that’s intimately tied to each individual request, so we’re going to start by first creating the per-request option value to specify how a single request wants to behave.

The Option to Retry

The decision to retry a request comes down to a single question: “given this response, should I retry the request and if so, how long should I wait to before doing so?” We can express this as a protocol:

public protocol HTTPRetryStrategy {
    func retryDelay(for result: HTTPResult) -> TimeInterval?
}

Our “strategy” (or policy) for retrying requires examining the HTTPResult we got from the previous invocation of the request and then returning a TimeInterval (a length of time in seconds) to wait before sending the request again. Returning nil would mean to “don’t retry”, and 0 would mean “try again immediately”.

We can immediately imagine several different kinds of strategies, such as retrying immediately, always waiting the same amount of time, or waiting an exponentially increasing length of time:

public struct Backoff: HTTPRetryStrategy {
    public static func immediately(maximumNumberOfAttempts: Int) -> Backoff
    public static func constant(delay: TimeInterval, maximumNumberOfAttempts: Int) -> Backoff
    public static func exponential(delay: TimeInterval, maximumNumberOfAttempts: Int) -> Backoff
}

Since this is a protocol, we can also imagine a custom implementation to provide a more dynamic implementation. For example, the Twitter API says:

• On queries that are rate limited (those that return an HTTP 429 status code), you must inspect the x-rate-limit-reset header and retry only at or after the time indicated.

• On queries that result in an HTTP 503 Service Unavailable status code, you must inspect the retry-after header and retry only after the time indicated.

Many HTTP services provide similar options. If you’re sending too many requests too quickly or the information isn’t available yet, they may often indicate in a response header how long you should wait before trying again. Thus you could write a custom retry strategy that implements the specific behavior of the API you’re targeting:

struct TwitterRetryStrategy: HTTPRetryStrategy {
    func retryDelay(for result: HTTPResult) -> TimeInterval? {
        // TODO: are there other scenarios to consider?
        guard let response = result.response else { return nil }

        switch response.statusCode {

            case 429: 
                // look for the header that tells us when our limit resets
                guard let retryHeader = response.headers["x-rate-limit-reset"] else { return nil }
                guard let resetTime = TimeInterval(retryHeader) else { return nil }
                let resetDate = Date(timeIntervalSince1970: resetTime)
                let timeToWait = resetDate.timeIntervalSinceNow()
                guard timeToWait >= 0 else { return nil }
                return timeToWait

            case 503:
                // look for the header that tells us how long to wait
                guard let retryHeader = response.headers["retry-after"] else { return nil }
                return TimeInterval(retryHeader)

            default:
                return nil
        }
    }
}

Warning: This code is uncompiled and untested and is here for illustrative purposes only.

With these strategies defined, we need a formal HTTPRequestOption type to declare that it can be attached to a request:

public enum RetryOption: HTTPRequestOption {
    // by default, HTTPRequests do not have a retry strategy, and therefore do not get retried
    public static var defaultOptionValue: HTTPRetryStrategy? { nil }
}

extension HTTPRequest {    
    public var retryStrategy: HTTPRetryStrategy? {
        get { self[option: RetryOption.self] }
        set { self[option: RetryOption.self] = newValue }
    }
}

The Loader

The loader we create to handle this will be our most complicated loader so far. My personal implementation is about 200 lines of code, and is too long to fully list in this post. I’ll highlight the key parts of it, though.

All HTTPTasks received via the load(task:) method are duplicated before being passed on to the next loader in the chain. This is because that each task should be executed only once, and so multiple invocations of a request will require multiple tasks.
We’ll need a way to remember which “duplicated” task corresponds to an original task.
We’ll need a way to keep a list of all the tasks that are waiting to be retried, and the time at which they want to be started.
Therefore we’ll need some sort of Timer-like mechanism to keep track of “when should the next task be started”.
Cancellation will be a bit tricky, because the original task will be cancelled, but we’ll need a way to see that happening and forward the cancellation command on to any duplications.
Don’t forget about resetting

Taking all of this in mind, my implementation looks roughly like this:

// TODO: make all of this thread-safe
public class Retry: HTTPLoader {
    // the original tasks as received by the load(task:) method
    private var originalTasks = Dictionary<UUID, HTTPTask>()

    // the times at which specific tasks should be re-attempted
    private var pendingTasks = Dictionary<UUID, Date>()

    // the currently-executing duplicates
    private var executingAttempts = Dictionary<UUID, HTTPTask>()

    // the timer for notifying when it's time to try another attempt
    private var timer: Timer?
    
    public override func load(task: HTTPTask) {
        let taskID = task.id
        // we need to know when the original task is cancelled
        task.addCancelHandler { [weak self] in
            self?.cleanupFromCancel(taskID: taskID)
        }
        
        attempt(task)
    }
    
    /// Immediately attempt to load a duplicate of the task
    private func attempt(_ task: HTTPTask) {
        // overview: duplicate this task and 
        // 1. Create a new HTTPTask that invokes handleResult(_:for:) when done
        // 2. Save this information into the originalTasks and executingAttempts dictionaries

        let taskID = task.id        
        let thisAttempt = HTTPTask(request: task.request, completion: { [weak self] result in
            self?.handleResult(result, for: taskID)
        })
        
        originalTasks[taskID] = task
        executingAttempts[taskID] = thisAttempt
        
        super.load(task: thisAttempt)
    }
    
    private func cleanupFromCancel(taskID: UUID) {
        // when a task is cancelled:
        // - the original task is removed
        // - any executing attempt must be cancelled
        // - any pending task must be removed AND explicitly failed
        //   - this is a task that was stopped at this level, therefore
        //     this loader is responsible for completing it

        // TODO: implement this
    }
    
    private func handleResult(_ result: HTTPResult, for taskID: UUID) {
        // schedule the original task for retrying, if necessary
        // otherwise, manually complete the original task with the result

        executingAttempts.removeValue(forKey: taskID)
        guard let originalTask = originalTasks.removeValue(forKey: taskID) else { return }
            
        if let delay = retryDelay(for: originalTask, basedOn: result) {
            pendingTasks[taskID] = Date(timeIntervalSinceNow: delay)
            rescheduleTimer()
        } else {
            originalTask.complete(with: result)
        }
    }
    
    private func retryDelay(for task: HTTPTask, basedOn result: HTTPResult) -> TimeInterval? {
        // we do not retry tasks that were cancelled or stopped because we're resetting
        // TODO: return nil if the result indicates the task was cancelled
        // TODO: return nil if the result indicates the task failed because of `.resetInProgress`
        
        let strategy = task.request.retryStrategy
        guard let delay = strategy?.retryDelay(for: result) else { return nil }
        return max(delay, 0) // don't return a negative delay
    }
    
    private func rescheduleTimer() {
        // TODO: look through `pendingTasks` find the task that will be retried soonest
        // TODO: schedule the timer to fire at that time and call `fireTimer()`
    }
    
    private func fireTimer() {
        // TODO: get the tasks that should've started executing by now and attempt them
        // TODO: reschedule the timer
    }
    
    public override func reset(with group: DispatchGroup) {
        // This loader is done resetting when all its tasks are done executing

        for task in originalTasks.values {
            group.enter()
            task.addCompletionHandler { group.leave() }
        }
        
        super.reset(with: group)
    }
}

This rough outline illustrates the principle of the “automatically retrying” loader. As requests come in, they’re saved off to the side and duplicates are forwarded on down the chain. As the duplicates complete, the loader examines the response and figures out what it should do with it. If the request’s retry strategy indicates it should try again, then it enqueues the task for a future date. If not, it takes the result for the duplicate request and pretends it was the original response all along.

The Chain

The Retry loader is the first loader we’ve created where its placement in the chain affects the chain’s overall behavior. Let’s consider a scenario where we have two loaders: a Retry loader and a Throttle loader:

let throttle = Throttle()
throttle.maximumNumberOfRequests = 1

let retry = Retry()

Now let’s suppose we want to execute two tasks, taskA and taskB, and let’s also imagine that taskA will be retried up to 3 times before ultimately failing, and that taskB will succeed.

let taskA: HTTPTask = ...
let taskB: HTTPTask = ...

let chain1 = throttle --> retry --> ...
let chain2 = retry --> throttle --> ... 

If the throttling loader is placed before retrying loader, then the limitation of “max 1 request” happens before a request can be retried. Therefore if chain1 loads taskA and then taskB, the order of execution will always be: A (attempt 1), A (attempt 2), A (attempt 3), B. If there are large delays between attempts of taskA, then taskB could be waiting a very long time before it’s ever attempted.

On the other hand, if chain2 loads taskA and then taskB, the order of execution is indeterminate. It could be A (attempt 1), B, A (attempt 2), A (attempt 3), and B gets a chance to execute much sooner.

The “right” order is entirely dependent on your desired behavior, but I would suggest that throttling will likely be one of the final loaders in the chain, so that the chain doesn’t inadvertently starve incoming requests.

In the next post, we’ll take at our first look at authentication using basic access authentication.

The Setup

The Option to Retry

The Loader

The Chain

Related️️ Posts️