HTTP in Swift, Part 1: An Intro to HTTP
Part 1 in a series on building a Swift HTTP framework:
- HTTP in Swift, Part 1: An Intro to HTTP
- HTTP in Swift, Part 2: Basic Structures
- HTTP in Swift, Part 3: Request Bodies
- HTTP in Swift, Part 4: Loading Requests
- HTTP in Swift, Part 5: Testing and Mocking
- HTTP in Swift, Part 6: Chaining Loaders
- HTTP in Swift, Part 7: Dynamically Modifying Requests
- HTTP in Swift, Part 8: Request Options
- HTTP in Swift, Part 9: Resetting
- HTTP in Swift, Part 10: Cancellation
- HTTP in Swift, Part 11: Throttling
- HTTP in Swift, Part 12: Retrying
- HTTP in Swift, Part 13: Basic Authentication
- HTTP in Swift, Part 14: OAuth Setup
- HTTP in Swift, Part 15: OAuth
- HTTP in Swift, Part 16: Composite Loaders
- HTTP in Swift, Part 17: Brain Dump
- HTTP in Swift, Part 18: Wrapping Up
For a while now I’ve had a series of blog posts floating around in my head on how to build an HTTP stack in Swift. The idea started last spring with Rob Napier’s blog posts on protocols, and matured last summer and fall while I was working at WeWork on an internal Swift framework.
So, with my newfound blogging powers, I think it’s time to tackle this problem. Over the course of several posts, I’ll walk through a number of topics related to HTTP and how we can fit them together into a great Swift framework. We’ll end up with a framework that supports things like:
- automatically retrying requests
- throttling to a customizable maximum number of concurrent requests
- specifying custom server environments for an entire connection stack, or for a single request
- automatically cancelling in-flight requests when the stack is reset
- basic mesh networking
- implementing OAuth (or any sort of authenticated request)
- multi-part form uploads
- easy mocking of responses
- … and a whole lot more I’m glossing over
We’ll be building all of this on top of URLSession
. And along the way, we’ll be addressing a lot of the feedback I provided to Apple Engineers about the state of the built-in networking stack. (Awesomely, in the past 2 years a couple of those items have already been addressed)
I will preempt any questions by saying: I will not be publishing the actual code for this framework online. The purpose of these posts is to describe the problems and solutions, but implementing them will be left up to you. I will have small embedded snippets to illustrate concepts, but largely the entire thing will be left as an “exercise for the reader”. There’s just no substitute for sitting down and writing the code yourself.
But… before we get started in Swift, we need to answer a really fundamental question.
What on earth is HTTP?
HTTP, or the “Hypertext Transfer Protocol”, is a specification for describing how two different systems can communicate with each other. As its name implies, it’s a text-based specification. This is great for humans, because text is nicely readable, and it’s good for machines, because text is pretty nicely compressible (which is a major part of the 2.0 specification).
HTTP defines a very specific “request/response” model. You send out a request for information, and you get a response back. One request, one response.
A request has the following format:
- A request line
- Zero-or-more header value lines
- A blank line
- An optional body
Request Lines
A request line looks something like:
GET /api/ HTTP/1.1
The first part of the line is the request method. We’re most familiar with GET
and POST
, but other common HTTP methods include things like HEAD
, PUT
, DELETE
, and so on. In reality, the HTTP spec itself does not place a limit on the value of the request method, which allows for specs like WebDAV (which powers CalDAV and CardDAV) to add their own methods like COPY
, LOCK
, PROPFIND
, and so on.
This leads us to the simplest of problems with a lot of Swift networking frameworks. It’s really common to see something like this:
public enum HTTPMethod: String {
case get = "GET"
case post = "POST"
case put = "PUT"
case delete = "DELETE"
}
Since the HTTP spec allows for any single-word method, defining this value in Swift as an enum is incorrect, because an enum only allows for a finite number of values, but there are an infinite number of possible “single word” values. A better implementation of this is to use a struct
:
public struct HTTPMethod: Hashable {
public static let get = HTTPMethod(rawValue: "GET")
public static let post = HTTPMethod(rawValue: "POST")
public static let put = HTTPMethod(rawValue: "PUT")
public static let delete = HTTPMethod(rawValue: "DELETE")
public let rawValue: String
}
After the method comes the path, which should be readily recognizable as the .path
of a URL
.
Finally, there’s a bit about which HTTP spec version you’re using. HTTP 2.0 is commonly used these days, but since it mainly deals with compressed HTTP requests, I’ll use the 1.1 version, which describes uncompressed requests.
Headers
Headers are a series of key-value pairs, in a familiar key-value-looking format:
Host: swapi.dev
Connection: close
User-Agent: Paw/3.1.10 (Macintosh; OS X/10.15.5) GCDHTTPRequest
There are a number of header fields commonly used with HTTP, some of them defined by the specs, and some agreed upon by convention. Like with request methods, there’s no limit to the possible names of a header, nor much of a limit on the possible values.
The Body
Any HTTP request may include a body. Yes, even a GET
request is allowed to include a body. However, most servers will only interpret the body of a request based on which request method is used, so it’s likely that even if you did include a body with a GET
requests, the server would ignore it.
The body itself is just raw binary data. How it’s interpreted by the server depends entirely on conventions agreed upon between the server engineers and the client engineers.
Putting a request together
If we want to execute a basic HTTP request to the Star Wars API, we would construct an HTTP request that looks like this:
GET /api/ HTTP/1.1
Host: swapi.dev
Connection: close
User-Agent: Paw/3.1.10 (Macintosh; OS X/10.15.5) GCDHTTPRequest
We can see the request line, 3 header lines, and an empty line at the end. It’s important to note that every line in an HTTP request is separated by two characters: \r\n
(carriage return + line feed). This becomes more apparent when we look at the hex representation of this request (newlines inserted for readability):
474554202F6170692F20485454502F312E310D0A
486F73743A2073776170692E6465760D0A
436F6E6E656374696F6E3A20636C6F73650D0A
557365722D4167656E743A205061772F332E312E313020284D6163696E746F73683B204F5320582F31302E31352E35292047434448545450526571756573740D0A
0D0A
0D
and 0A
are the hexadecimal representation of \r
and \n
, respectively. The empty line is how an HTTP server knows where the headers stop and the body begins.
As an aside, if you find yourself working with network APIs a lot, I highly recommend using Lucky Marmot’s fantastic app Paw. It is an indispensable tool for analyzing and tweaking network calls.
Another thing worth pointing out here is that Host:
header. While it’s required by the spec, it doesn’t necessarily specify where the request is going. That’s because the host part of the URL is used before the request is ever sent by the DNS resolution process as part of the whole “what server am I actually connecting to” flow. The Host
header is included for clarity while debugging, and it may be used by the server for further internal routing. For example, you could imagine a server for all of Github Pages using the Host
header to know which set of pages you actually want.
So, that’s the request.
What about the response?
The response we get back is almost identical in structure to the request. It has:
- The response line
- Zero or more header value lines
- An empty line
- An optional body
So, that request to the Star Wars API will have a response that looks like this:
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Sat, 27 Jun 2020 19:13:53 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: close
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
ETag: "57e8c3fe1ac5cb74e15b96dc98767ce6"
Allow: GET, HEAD, OPTIONS
Strict-Transport-Security: max-age=15768000
{"people":"http://swapi.dev/api/people/","planets":"http://swapi.dev/api/planets/","films":"http://swapi.dev/api/films/","species":"http://swapi.dev/api/species/","vehicles":"http://swapi.dev/api/vehicles/","starships":"http://swapi.dev/api/starships/"}
Like with the request, all lines are separated by \r\n
, and the empty line is how we know where the headers stop and the body begins. The only other difference is that the first line now has an integer status code and a textual description of the code’s meaning.
Like with request methods, there’s no real limit on the status code values and message. The HTTP spec defines many that we’re familiar with, such as 200
(OK) and 404
(Not Found), as well as the allowed range of numbers (100 … 599). But… you could return a 666 Diabolical
code, or 42 Mostly Harmless
code. You could even return a 200 👍
response. But please don’t. 😅
The body, again, is raw binary data and it’s up to the recipient to correctly interpret it. The Star Wars API body is JSON-encoded text, but a body can just as easily be the raw bytes of a .jpg
file, or the XML bytes of an RSS feed, or the text that describes an HTTP live streaming playlist.
Putting it together
That’s an overview of HTTP requests and responses. This simple model describes the fundamental communications used by most of the internet. It doesn’t cover things like web sockets or push notifications, but this is used by every website and most network APIs. Knowing the details of how this works continues to be one of the most valuable and consistently useful things I know and use as a software engineer.
In the next post, we’ll take a look at how we move from this specification to the basic structures we’ll need for our framework.