Simplifying Swift framework development

I’ve developed a handy trick when writing frameworks in Swift that makes the overall process a little bit nicer, and it’s just adding a single file to your framework.

Let’s say you’re building CoreAwesome.framework for inclusion in your app, or publishing to Github, or whatever. There are a couple things you end up doing a lot:

First, you end up having lots of import OtherFramework statements scattered throughout your .swift files, so that portions of your framework can have access to pieces of functionality provided by dependencies (whether system frameworks or whatever). I find that sort of repetition (import Foundation, anyone?) to be pretty annoying.

Second, you don’t always have a good way of getting access to the Bundle that corresponds to the framework easily. You need this particularly if you’re loading bundle-specified resources.

So, based on these two things, I’ve developed this pattern, which is kind of like a swift framework pre-compiled header:

When I create CoreAwesome.framework, I get CoreAwesome.h, and that’s about it. So I immediately add CoreAwesome.swift at the top level next to the .h, and put this in it:

// CoreAwesome.swift
@_exported import Foundation
@_exported import DependencyA
@_exported import DependencyB

public let CoreAwesome = Bundle(for: CoreAwesomeMarker.self)

private class CoreAwesomeMarker { }

First, there’s this weird @_exported thing. The underscore indicates we need to be a bit wary of it, because it’s not a modifier you’re really supposed to use. But if you do…

@_exported will make an import-ed module visible to the entire module into which its been imported. This means you don’t have to import Dependency in every file you need it. You just @_exported that dependency once, and you’re good to go in any file in that module.

This is especially nice if DependencyA defines public operators, which aren’t always imported the same way that symbols are. That’s a topic for another day.

Second, I define a public constant that is the name of the framework, and whose value is the Bundle for that framework. I use the class-based look up (ie, find the bundle that declares this class), because it’s one of the few convenient Bundle initializers that doesn’t return a Bundle?, and thus I don’t have to deal with unwrapping. And then I use a special marker class for making that lookup resilient in the face of other functionality changes.

With the constant in-hand, I can easily load resources:

let resource = CoreAwesome.url(forResource: "Foo", withExtension: "plist")

This reads pretty naturally, and the entire file makes developing my framework just a little bit easier.


Update: I neglected to mention that it was Joe Fabisevich who clued me in to the @_exported trick. Thanks, Joe!!

Update #2: Both Harlan Haskins and Kevin Ballard pointed out that the constant to access the framework’s bundle will conflict with a module-qualified declaration. Like, if you have two modules that declare a Foo, then you need to disambiguate which one you want by doing Module1.Foo vs Module2.Foo. However, if Module1 is the name for a Bundle instance, this breaks.

Solving this problem is left as an exercise for the reader. 😉

Reading your own entitlements

When you’re writing an iOS or macOS app, you typically don’t need to dynamically know what your own entitlements are. However, there are a couple of rare circumstances when it could be Nice To Have.

I recently came across one of those situations. Like most developers, I have a set of core libraries I maintain that I use in my apps. These libraries tend to contain all of the common pieces of code I’ve found to be helpful. It’s like my own private SDK.

For example, I have a Sandbox type that represents a set of on-disk locations:

public class Sandbox {

    public static let currentProcess: Sandbox

    public let documents: AbsolutePath
    public let caches: AbsolutePath
    public let support: AbsolutePath
    public let temporary: AbsolutePath

    public init(documents: AbsolutePath, caches: AbsolutePath, support: AbsolutePath, defaults: UserDefaults)
    public convenience init?(groupIdentifier: String)
}

I got thinking that it’d be nice to add a static default property on Sandbox that would contain the “default” Sandbox. In a situation where my app doesn’t use a shared group container, the default sandbox would be the current process’s sandbox. If I do have one (or more) shared group containers, then the default sandbox would be the first one listed. This is similar to how CKContainer.default behaves. You could imagine wanting similar behavior if you have a wrapper around the Keychain APIs, for example.

However, entitlements end up getting embedded inside your app binary. They are not a copied-in resource, but are rather a blob of data stuck in your executable file.

After lots of googling and asking friends, I eventually found two resources that were exceptionally helpful:

1️⃣ The first was this StackOverflow.com answer by Cédric Luthi. In it, he shows how to use the dyld APIs to find your executable image, and then iterate through the sections defined in that image until you find the one you’re looking for. In his case, he wanted the LC_UUID section. In order to read your own entitlements, you want the LC_CODE_SIGNATURE section.

Once you’ve found the LC_CODE_SIGNATURE section, you know roughly where in your executable file the entitlements are located. However, the resulting __LINKEDIT section has a largely undocumented format, and it wasn’t until Daniel Jalkut suggested I find the codesign source that I made any progress.

2️⃣ After some targeted googling, I found this source code on opensource.apple.com. This file appears to be a decent amount of the source for the codesign utility, but the really awesome bit is that lc_load_sig function. There, finally, is how to poke around in that special __LINKEDIT section and interpret what’s going on.

Armed with these two pieces of data, we can now build some code that will read the entitlements blob out of your own executable (and once you have the blob, you can run it through NSPropertyListSerialization to parse and introspect it):


#import <Foundation/Foundation.h>
#import <mach-o/dyld.h>
/*
 * Structure of an embedded-signature MultiBlob (called a SuperBlob in the codesign source)
 */
typedef struct __BlobIndex {
    uint32_t type;                   /* type of entry */
    uint32_t offset;                 /* offset of entry */
} CS_Blob;

typedef struct __MultiBlob {
    uint32_t magic;                  /* magic number */
    uint32_t length;                 /* total length of SuperBlob */
    uint32_t count;                  /* number of index entries following */
    CS_Blob index[];                 /* (count) entries */
    /* followed by Blobs in no particular order as indicated by offsets in index */
} CS_MultiBlob;

extern NSData *EntitlementsData(void) {

    // iterate through the headers to find the executable, since only the executable has the entitlements
    const struct mach_header *executableHeader = NULL;
    for (uint32_t i = 0; i < _dyld_image_count() && executableHeader == NULL; i++) {
        const struct mach_header *header = _dyld_get_image_header(i);
        if (header->filetype == MH_EXECUTE) { executableHeader = header; }
    }

    if (executableHeader == NULL) { return nil; }

    // find if it's a 64-bit executable or not
    BOOL is64bit = executableHeader->magic == MH_MAGIC_64 || executableHeader->magic == MH_CIGAM_64;
    uintptr_t cursor = (uintptr_t)executableHeader + (is64bit ? sizeof(struct mach_header_64) : sizeof(struct mach_header));

    // iterate through the dyld commands to find the "LC_CODE_SIGNATURE" command
    const struct segment_command *segmentCommand = NULL;
    for (uint32_t i = 0; i < executableHeader->ncmds; i++, cursor += segmentCommand->cmdsize) {
        segmentCommand = (struct segment_command *)cursor;
        if (segmentCommand->cmd != LC_CODE_SIGNATURE) { continue; }

        const struct linkedit_data_command *dataCommand = (const struct linkedit_data_command *)segmentCommand;

        // jump to the offset specified by the command
        uintptr_t dataStart = (uintptr_t)executableHeader + dataCommand->dataoff;
        CS_MultiBlob *multiBlob = (CS_MultiBlob *)dataStart;
        if (ntohl(multiBlob->magic) != 0xfade0cc0) { return nil; }

        // iterate through the blobs in this segment until we find the one with the appropriate magic value
        uint32_t count = ntohl(multiBlob->count);
        for (int i = 0; i < count; i++) {
            uintptr_t blobBytes = dataStart + ntohl(multiBlob->index[i].offset);
            uint32_t blobMagic = ntohl(*(uint32_t *)blobBytes);
            if (blobMagic != 0xfade7171) { continue; }

            // the first 4 bytes are the magic
            // the next 4 are the length
            // after that is the encoded plist
            uint32_t blobLength = ntohl(*(uint32_t *)(blobBytes + 4));
            return [NSData dataWithBytes:(const void *)(blobBytes + 8) length:(blobLength - 8)];
        }
    }

    return nil;
}

Update

It turns out that the code above only works when you’re deploying to a device. If you’re just building for simulator, then the entitlements actually go in a different location; they’re in the __TEXT segment in a section called __entitlements, and they’re the sole contents of that section. This means reading them is very easy:

NSData *ReadEntitlementsFromTEXT(const struct mach_header *executable) {
    uint32_t dataOffset;
    uint64_t dataLength;

    BOOL is64bit = executable->magic == MH_MAGIC_64 || executable->magic == MH_CIGAM_64;
    if (is64bit) {
        const struct section_64 *section = getsectbynamefromheader_64((const struct mach_header_64 *)executable, "__TEXT", "__entitlements");
        dataOffset = section->offset;
        dataLength = section->size;
    } else {
        const struct section *section = getsectbynamefromheader(executable, "__TEXT", "__entitlements");
        dataOffset = section->offset;
        dataLength = (uint64_t)section->size;
    }

    uintptr_t dataStart = (uintptr_t)executable + dataOffset;
    return [NSData dataWithBytes:(const void *)dataStart length:dataLength];
}

Misusing enums

This is a response to Matt Diephouse’s article, which is itself a response to John Sundell’s article. You should go read these first.

Matt starts off by saying:

Earlier this week, John Sundell wrote a nice article about building an enum-based analytics system in Swift. He included many fine suggestions, but I believe he’s wrong about one point: that enums are the right choice.

Therefore, I’ll start similarly:

Earlier this week, Matt Diephouse wrote a nice article about building a struct-based analytics system in Swift. He included many fine suggestions, but I believe he’s wrong about one point: that structs are the right choice.

As the examples in both articles show, analytic events can have different payloads of information that they’re going to capture. Matt uses a metadata property (a Dictionary<String, String>) to capture this, but this approach imposes a significant burden at the callsite. In order to properly fill out an AnalyticEvent struct, the callsite must know how to destructure the current model/event information and package it up inside the metadata property.

This could be fine if you’re dealing with a very small app, but given that we’re at the point where we’re putting in analytics, the chances that this app is small is pretty remote.

Additionally, it’s often not the case that analytics are defined at the application level. They’re typically brought in as part of an underlying framework. This means that the top-most levels of abstraction in your app (the app itself) have to know intimate, implementation details about how levels far below (the raw analytic communication protocol) work and expect their data to be formatted. Now, it’s possible that the metadata is a very loose bucket of “anything goes here”, but that still requires a lot of extra code at the callsite.

And when you’re putting in analytics, you want them to be as least invasive as possible, because they’re not the point of your app.

So, in my opinion, the better approach would be to define your analytics by a protocol, like this:

protocol AnalyticEvent {
    var name: String { get }
    var payload: Dictionary<String, String> { get }
}

The analytics capture mechanism would then only accept values of type AnalyticEvent (the protocol), and not define concrete types itself.

Here are some of the benefits we get by taking this approach:

Semantically appropriate event definition

Each layer of your app can define its own types (whether an enum, class, or struct) that conform to the AnalyticEvent protocol. Perhaps your Networking layer defines events that capture how often it has to hit the network versus how often it has to hit the cache. It can define those in its own custom NetworkingEvent: AnalyticEvent type.

Minimized callsites

By having each layer define its own type, it can create minimized initializers for its custom AnalyticEvent that would be impractical to do with an enum, and cumbersome with a struct. For example, if I had a UserSessionEvent: AnalyticEvent, I could create an initializer that takes a LoginFailureReason as the parameter, and then the initializer turns it in to the privately-known name and payload.

Type checking events

I could create a whole bunch of extensions on an AnalyticsEvent struct to do the custom initializer for the struct that is contextually appropriate for the callsite, but that explodes the numbers of initializers I have to sort through when creating an AnalyticsEvent. The autocomplete would show me every possible initializer for every possible event type, which is a total pain.

On the other hand, by requiring a custom type that adopts the AnalyticsEvent protocol, I can narrow my focus down to only the autocompletion results that are possible for a UserSessionEvent or a NetworkingEvent or a AwesomeCarouselWidgetInteractionEvent, etc.

With this sort of type-checking in place, refactoring these events also becomes easy. I can search the codebase for just NetworkingEvent to see every place where that type of event is getting generated.

Easy extensibility

Since AnalyticEvent is a protocol, adding a new kind of event is trivial. I don’t have to add a new case to an enum. I don’t have to litter stringly-typed event names throughout my code. I don’t have to add another extension to a struct. The protocol makes it easy to isolate concerns to their respective layers.

So, next time you reach for an enum, ask yourself whether you’re going to switch over it. If you’re not, you’re probably better off with a protocol instead.


EDITED TO ADD

It should go without saying that I really appreciate both Matt and John discussing these patterns in a public forum. In no way should my comments be construed as any sort of personal attack or judgement of Matt or John.

Level up your debugging skills

Finding the root cause of an error in your app can often feel very intimidating, whether you’re brand-new to programming or you’ve been building coding for decades. Debugging problems can be extremely time consuming. Where do you start looking? How do you know if what you think is the problem is actually the problem?

While there is no one-size-fits-all approach to debugging, there is a major guiding principle that can be extremely helpful in determining the root cause of an error, and that is to apply the scientific method to your analysis. The Scientific Method is a process we use to learn new things, analyze why things happen, and correct misconceptions. In simple terms, the steps involved are:

  1. Observe
  2. Ask questions
  3. Hypothesize
  4. Test

Observation

There are many ways to observe problems in your app. Maybe it crashed. Maybe it didn’t do something you were expecting it to. Maybe your users or QA testers observed some incorrect behavior. Maybe you saw something yourself. Regardless of how it happened, debugging always starts with an observed issue.

Ask Questions

Questions form the basis of debugging (and indeed, all learning in general). Depending on your goals, deadlines, or other constraints and circumstances, your questions may range from “why is this happening and how do I fix the root cause?” to “how do I hide the symptoms of this issue?”

There are no wrong questions here. Some questions may ultimately lead to irrelevant information, but the process of asking these questions helps train your mind to ask better questions in the future.

It’s also possible that your observations lead you to areas where you don’t have the required domain knowledge to even know what questions to ask. In situations like this, you should talk to other developers and ask “what questions should I be asking?”. Try to not ask for the answer, but instead ask for the question. Getting an answer gives you a single point of reference. But getting a question gives you a trajectory for future learning.

Hypothesize

Once you’ve formed a question, next comes a very fun part: you get to make up an answer! The answer you invent can be anything, as long as it is testable. For example, you could hypothesize that the reason your app crashes is because the Moon was exactly at its orbital apogee. This is unlikely to be the root cause of your app’s crash (unless you’re writing an astronomy app?), but the point is that your hypothesis can be entirely made up.

Over time, experience will teach you to recognize patterns. For example, experienced Objective-C developers have long recognized that an EXC_BAD_ACCESS crash is likely an error related to memory management. Swift developers know what a crash due to force-unwrapping nil looks like. The more practice you get at debugging, the easier and quicker this process will become.

Test

After you’ve formed a hypothesis, now you get to perform an experiment. In order to analyze the hypothesis, you need to devise a way to either prove or disprove your theory. Sometimes, this is really trivial: run your app again, set a breakpoint, and check to see if your variable is nil when you try to unwrap it.

Sometimes the process is a lot more complex. This is where the developer tools can be invaluable.

At the most fundamental level, we have the ability to print messages to the console. This is often referred to as “caveman debugging“, because it’s using the simplest and most fundamental tools. Sometimes this is good enough. But print() and NSLog() can be tedious tools to work with.

Fortunately, Xcode provides some really useful debugging tools beyond print() and NSLog(), like breakpoints and watchpoints. The debug gauges in the Debug Navigator can help you observe the state of your app. If you see memory usage continuously increasing, then you have Observed Something and may need to consider Asking More Questions. You also have more advanced tools available, like the LLDB console, everything in Instruments, and even hyper-specialized tools like dtrace.

These tools help you analyze the results of your experiment. The experiment you construct should ideal follow proper procedures and allow you to test your variables in isolation, as well as having a control.

Ideally, these experiments and analytic tools help you prove your hypothesis correct. If they do, then it’s time to start figuring out how to solve the problem. Often, testing proves our hypothesis false. We prove that what we thought was the issue wasn’t actually the problem. When this happens, we go back to step one: we’re still observing the problem, which means we need to ask new questions, form new hypotheses, and devise new tests to examine the theories.

And of course, these tests you devise should ideally be captured in the unit tests of your app, to help you guarantee that the problem doesn’t resurface in the future as other code changes!

Parting Thoughts

This pattern of Observe-Question-Hypothesize-Test is extremely powerful. It doesn’t always work for every kind of problem (such as problems that are inherently not reproducible), but when it’s applicable it is an excellent way to organize your thoughts and know that you’re on the right path towards solving your app’s problems and becoming a better and wiser developer.

So the next time you’re stuck on a bug and don’t know how to proceed, consider applying the scientific method.

A Better MVC, Part 4: Future Directions

Part 4 in a series on “fixing” Model-View-Controller:

  1. The Problems
  2. Fixing Encapsulation
  3. Fixing Massive View Controller
  4. Future Directions

There are other ways you can apply these principles to writing more maintainable apps.

View Controller-based Cells

One that I’m really interested in and am actively researching is how to use view controllers as cell content.

Cells can be really complicated things, architecturally. If you have any sort of dynamic content in a cell, you’re often faced with weird situations where you wonder “where does this logic belong?”. Do you put image loading in your view subclass? Allow your cell to handle complex business logic? Tack on additions to the list’s delegate protocol so you can message things back out (which then just complicates your list view controller)?

Using view controllers as cells helps answer these questions. It is natural to place this sort of code in the view’s controller, and the fact that the view happens to be a cell doesn’t really make a difference to the underlying principle.

Small finite lists

It’s really easy to make a view controller a cell when you have a list of a finite size. You have your “ListViewController”, and you give it an array of view controllers. It adds them as children, and then pulls out the appropriate one when its time to show a cell, and sticks the view controller’s view inside the cell’s contentView.

You can then apply the same principles of “flow view controllers” and “mini view controllers” to the content of the cell, and use child view controllers to manage portions or variations of the cell. I used to write a Mac app where I used this approach, and I could sometimes get upwards of 15 view controllers in a single cell. Granted, the cells were pretty complicated in what they could do, but none of these view controllers was longer than 100 lines of code. It made debugging issues trivially easy.

At this level of granularity in a finite list, you also probably aren’t very worried about view lifecycle callbacks, because you can pre-allocate everything.

Huge finite lists

Once you start moving past the point where you can pre-allocate everything, you have two main approaches you can take. The first approach (“huge finite lists”) is like what the current UITableView and UICollectionView API offer: you know up-front how many items are in a section. In this case, your ListViewController would likely have a similar looking datasource API as UITableView, where it progressively asks for view controllers and then uses the underlying “willBeginDisplayingCell:” etc callbacks to notify on lifecycle.

At this level you might also want to start thinking about view controller reuse, but I would probably avoid thinking about that unless I measured and determined that view controller deallocation and initialization was a measurable bottleneck.

Infinite lists

Then there’s the problem of infinity. A great example of this is the main screen of the Reddit app. You can scroll forever, and the content will just keep loading and loading… there’s no way to know up-front how much content there is. With this, you’ll be looking at loading in “slices” of the infinite view controller sequence, or taking a page (haha) out of UIPageViewController‘s book and using the “doubly-linked list” sort of API (“what’s the view controller after A? What’s the view controller after B? etc).

You’ll still have the underlying callbacks to help manage view lifecycle, and you’ll also still have to consider view controller reuse as an option.

Boilerplate

As you get in to this style of programming, you’ll find that you end up developing a decent amount of boilerplate around embedding view controllers. That is to be expected, because the view controller containment APIs tend to offer the bare minimum to do what you need.

As you find situations where the API can be improved, please request that these improvements be made in the system frameworks.

Conclusion

There’s a lot to digest in these posts. Some people may scream in horror at the idea of “view controllers as cells”, but it’s an idea worth exploring.

So, the huge TL;DR of this is:

  • Decompose your UI using view controllers
  • Use view controllers to manage sequence
  • View controllers don’t have to fill the screen

I’d love to hear your thoughts on this topic. Hit me up on twitter or via email and let’s chat!

A Better MVC, Part 3: Fixing Massive View Controller

Part 3 in a series on “fixing” Model-View-Controller:

  1. The Problems
  2. Fixing Encapsulation
  3. Fixing Massive View Controller
  4. Future Directions

The principle behind fixing Massive View Controller is to unlearn a concept that’s inadvertently drilled in to new developers’ heads:

1 View Controller ≠ 1 screen of content

From our very first iOS apps, we’re taught the idea that 1 view controller == 1 screen of content. We see this in every simple “make a list and push a detail view” app. However, as our apps grow in complexity, so do our screens of content, and the default notion that 1 view controller == 1 screen of content quickly leads to Massive View Controller.

We can save ourselves from Massive View Controller by realizing that a view controller “controls a view” and that view doesn’t have to fill the screen.

Example: The WWDC App

Disclaimer: While I used to be the lead engineer on the WWDC app, I have not seen the code in quite some time and do not know if it is actually implemented this way. I am simply describing how I would build the screen if I were to build it today.

Here’s a screenshot of the WWDC app:

WWDC app session details page

On this one screen, there are five main pieces of content:

  1. The video
  2. The title
  3. The description
  4. The contextual actions
  5. The related content

There is a lot going on here, and if this were to be implemented as a single view controller, it would be a massive view controller.

So, let’s not do that.

Instead, we’re going to make each one of those 5 areas its own view controller, all contained within the SessionDetailsViewController. The outer details view controller will own the general layout of the screen, as represented by some empty container views. These will all be in a UIScrollView that manages its content bounds through auto layout.

The SessionVideoViewController would be in charge of loading up the appropriate poster frame for the video and responding to the user tapping the play button. When the user taps that button, the view controller will delegate out that the user intends to watch the video. The parent SessionDetailsViewController can then pass that intention up the chain until it arrives at a semantically appropriate level for that intention to be translated in to the corresponding WatchVideoOperation.

The SessionTitleViewController and SessionDescriptionViewController will be pretty simple, since all they’ll have to do is observe the model object for changes and update the labels. There’s also no interaction to delegate back out.

The SessionActionsViewController would basically be a UIStackView of buttons. The buttons would be created based on inspecting the model object, and interacting with the buttons delegates back out the corresponding intention: “toggle favorite”; “leave feedback”; “begin download”; etc.

Finally, the RelatedSessionsViewController would be a side-scrolling collection view. Tapping on a related session would delegate back out that the user wants to view the session. The SessionDetailsViewController, as we learned in the previous post, would not be the one to perform that action, but would instead relay the intention up to a more semantically appropriate level (such as the view controller that owns the UINavigationController).

At the end of this exercise, we end up with more view controllers (six instead of one), but each one is relatively small. The video view controller loads a poster frame. The title and description view controllers observe the model for changes. The actions and related content view controllers are a little more complex, but each one is focused on a very specific set of actions, and neither is onerous to understand.

Example: The Reddit App

Disclaimer: I have no idea how the Reddit app is built.

For this example, let’s take a look at the Reddit app.

The Reddit app

There are a bunch of interesting things going on here, but I want to focus on the top part of this post screen that contains the actual post. A cursory survey through the app shows that there are several different styles that this post content can take, in addition to having a title:

  • Text
  • Link
  • Animated gif
  • Static image
  • Video

It would be crazy to try and build the entire post page as a single view controller. At the very least, you’d want a “post content” view controller, and a “comments” view controller. But you can go further.

Imagine that the top post content is a PostContentViewController. You still have a large view controller as you have to handle one of these 5 different kinds of post content (showing all text vs loading a web preview vs animating a gif vs showing an image vs an inline video vs a link to an external video…).

So instead, make your PostContentViewController a “flow” view controller, and then have a different view controller for each kind of content. When you have a TextPostContentViewController, you never have to worry about dealing with loading callbacks. A LinkPostContentViewController only has to deal with loading a preview. A GIFPostViewController only ever has to load a gif. It’s easy, and you PostContentViewController just has to pick the right one, embed it, and then handle the odd delegation of user intent.

By combining both principles (view controllers as flow and small view controllers), you can easily decompose your UI in to small, manageable, testable, isolated, and grokkable chunks.

A Better MVC, Part 2: Fixing Encapsulation

Part 2 in a series on “fixing” Model-View-Controller:

  1. The Problems
  2. Fixing Encapsulation
  3. Fixing Massive View Controller
  4. Future Directions

In order to fix the encapsulation violation we saw earlier, we need to understand a pretty simple principle:

In general, a view controller should manage either sequence or UI, but not both.

A view controller that manages sequence is one that I jokingly call a “Manager View Controllers” because 1) “manager” and “controller” aren’t overloaded enough already and 2) it still has the acronym “MVC”. In reality, this is a variation on the “Coordinator” pattern that has captured some of our imagination.

The idea behind the Coordinator pattern (or flow controllers or whatever you call them) is that, in order to maintain encapsulation, you need a higher-level object to decide what comes next. It “controls the flow” in your app.

Where I tend to diverge from the “traditional” flow controller implementation is that I believe that these sorts of controllers should really just be view controllers higher up in the parent view controller chain. This saves you from having to hack a new kind of responder chain object in to UIViewController, and it means you don’t end up with a third parallel hierarchy of control to maintain in sync with the other two (the View hierarchy and the view controller hierarchy).

Using container view controllers as sequence “coordinators” makes a whole lot of things really easy. For example, consider a screen where you want to load a piece of remote content. But while it’s loading, you want a spinner to show up. If the content fails to load, you want an error screen and a “try again” button.

Implementing this as a single view controller would leave you in a place where you’re on the road to Massive View Controller. You’d have a view controller that’s probably hitting the network and then trying to rationalize which of the three different states it should be in (Error, Loading, or Success), then is responsible for making sure the right set of UI elements are visible and getting the model information into the proper set of out outlets.

Instead, abstract out the sequence from the UI. The sequence is the owner of the flow between the three states, and each of the states is its own view controller.

The ShowRemoteContentViewController (the owner of the sequence) has an empty view, and it embeds the proper content view controller depending on which state it should be in:

  • Show the ErrorViewController if your networking object reported back an error. The ErrorViewController delegates back out when the user taps on a “Try Again” button, which causes the delegate (the ShowRemoteViewController) to transition to the loading state while hitting the network
  • The LoadingViewController is an empty view controller that shows a loading indicator. It is literally zero lines of code, unless you want a simple init() method, in which case it is 4.
  • The MessageViewController is the view controller that handles the “success” state. It basically populates its UI from the injected model object, and delegates back out when the user taps a “Load Another” button.

A simple iOS project (Xcode 9.1, Swift 4) showing this in action is available here: iOS Architectures.zip.

We’ve taken something that was really complicated (a single object managing UI and state and flow) and turned it in to something eminently understandable; no view controller is longer than 100 lines of code long. Each one is isolated from everything else. A good use of delegation means that testing is trivial: we load up the UI, fake a tap on the button, and assert that the delegate method is invoked. There’s almost no cognitive load to understand any individual view controller. Even the “more complicated” ShowRemoteContentViewController is simple, because it’s just flipping back and forth between a couple of different child view controllers in reaction to some delegate method invocations or the network loading. It’s all really simple, and there are no violations of encapsulation.

The same pattern holds true when you’re dealing with more structured UI. If we imagine now a list UI, where tapping a list item brings up a detail view, we can apply this same principle.

First off, we don’t want the list knowing about the detail view controller or how to show it. So, we simply make it delegate back out to someone else what the user is intending to do. The list UI receives model objects from “the outside”, so that’s what it should send back. Therefore, in our implementation of didSelectRowAtIndexPath:, we simply translate the index path in to the corresponding model object, and then message the delegate what the user intention was: listViewController:userDidSelectModelObject: → “The user wants to look at this model object”.

So, who then is the delegate? You could argue that it might be the direct parentViewController, but in this case, the parent is likely a UINavigationController. The UINavigationController is already in charge of maintaining a stack of view controllers with its own UI (the navigation bar), so this doesn’t seem like a good candidate. Instead, let’s put the UINavigationController inside a container view controller, which we’ll call the ListFlowViewController.

The ListFlowViewController creates and embeds the plain UINavigationController in itself and gives it the root screen (the list view controller). Then, it makes itself the delegate of the list, because it’s the ListFlowViewController that created the list and (likely) knows what the model objects mean. Then when the user selects an item in the list, the ListFlowViewController receives the corresponding model object, knows how to turn it into the detail screen, and gives that screen to the UINavigationController to push. We again preserve proper encapsulation principles.

This same pattern also holds true on iPad (or regular width phones) where you deal with UISplitViewController. Having a container view controller own the split view means you have a natural place for the “master” view controller to message about the user intent, and handle how to appropriate display the detail view controller. This same container view controller could also decide to entirely eschew a split view controller if the app transitions back to a compact width size class.

The huge advantage of this approach is that system features come free. Trait collection propagation is free. View lifecycle callbacks are free. Safe area layout margins are generally free. The responder chain and preferred UI state callbacks are free. And future additions to UIViewController are also free.

In exchange, you have to suffer through cleaner code, smaller view controllers, and a few more delegate protocols, which unfortunately just make your code more isolated and testable. How on earth will you survive?? 🙃

A Better MVC, Part 1: The Problems

Part 1 in a series on “fixing” Model-View-Controller:

  1. The Problems
  2. Fixing Encapsulation
  3. Fixing Massive View Controller
  4. Future Directions

I recently ran across a great article by @radiantav called “Much ado about iOS architecture“. It addresses a topic that has been on my mind a lot. I gave a talk about it at the recent Swift by Northwest called “A Better MVC”. These blog posts attempt to capture the main points of my talk.

I apologize beforehand that there aren’t really any pictures in this. They’d definitely make the subject matter clearer, but I don’t feel like spending the time to make them. You’ll just have to use your imagination.

The “Problems” with MVC

The main reason that people decry MVC is that they tend to run afoul of two major problems when using it:

  1. MVC, as taught by Apple sample code, encourages you to violate encapsulation principles, which ends up leading to spaghetti code.
  2. Without proper discipline, your view controllers end up being huge, leading to the joke that “MVC” means “Massive View Controller”.

Violating Encapsulation

It’s pretty common to come across a UITableViewDelegate method that looks something like this:

func tableView(_ tableView: UITableView, didSelectRowAt indexPath: IndexPath) {
    let maybeItem = query?.object(at: indexPath)
    guard let item = maybeItem else { return }
    let detail = MyCustomDetailViewController(item: item)
    show(detail, sender: self)
}

There are two huge encapsulation violations that are occurring here.

The first is this line:

let detail = MyCustomDetailViewController(item: item)

The principles of encapsulation indicate that a thing should only really “know” about objects that it contains, and then only about one layer of abstraction down. However, this line, which constructs a subsequent screen-full of information, requires the view controller to know about the sequence of data flow in its parent abstraction.

In other words, this list isn’t just a list, but must also being aware of the context in how it’s used.

The second violation is similar to the first:

show(detail, sender: self)

Again, this violates the whole “don’t know your context” principle. And, it should be noted, these critiques hold true whether you’re manually creating and showing a detail view controller, or invoking a segue and using prepareForSegue:sender:.

Massive View Controller

It is sadly pretty common to come across a view controller that is thousands of lines long. Our view controllers quickly become cluttered with networking callbacks, delegate methods, data source methods, IBActions, trait collection reactions, model observation, and of course, our business logic. Heaven forbid if you’re doing any sort of progressive loading and want to swap in an indicator to show while things are loading… that will usually add in a couple hundred lines of code as you deal with swapping views around, managing yet another stateful property, and so on.

We end up with this situation because we don’t apply enough discipline to our view controllers. They’re convenient places to dump code. But they quickly become unmaintainable, fragile, and inherently untestable. Who wants to be the poor soul who has to debug some weird state going wrong in a 5,000+ line file?

Working around MVC

In my experience, these are pretty much the two fundamental reasons why developers find other architectural patterns so appealing. As a result, we turn to other architectural patterns to try and compensate. We reach for MVVM, or React, or MVP, or FRP, or VIPER, or insert-new-architecture-here.

To be clear, there is nothing inherently wrong with any of these patterns. They all solve problems in unique and interesting ways, and it is good to study them and learn the principles they teach. However, I’ve found that building apps based on these patterns tends to pay negative dividends in the long run.

All of these patterns tend to be “strangers” to UIKit and AppKit. Because of this difference, you end up with additional hurdles to clear when things change.

When your team members change, you have additional work to teach new developers about not just the business logic of your app, but often a whole new architectural pattern. This requires more up-front investment, and a longer lead time before team members can start being productive.

When the operating system changes, you have additional work to try and shoehorn new features in to architecturally-native concepts. You have to plumb through size classes, safe layout margins, dynamic type, localization, locale changes, preferred attributes (status bar style, etc), lifecycle events, state preservation and restoration, and who knows what else when new paradigms get inevitably added as iOS updates each year.

When your requirements change, you can sometimes be caught in the situation of having to fork someone else’s library, wait for them to catch up, or hope that they’ll accept your pull request. I won’t go in to the pros and cons of third-party dependencies here, but an architectural dependency has even more “gotchas” than a normal dependency, because of the way it can underpin your entire app.

Each step of the way, you may be celebrating that your code is clean and concise. However, you drift further and further away from what the system frameworks are providing, which means adopting new systems features requires more code to bring it to where you are.

Wouldn’t it be nice if you could just get it for free? Wouldn’t it be awesome if you could just use UIKit and AppKit and have it all “just work”? Wouldn’t it be nice to live in a world where you thought 300 lines in a view controller was excessively long?

In the next post, we’ll look at how we can fix the first problem.

Edit distance and edit steps

Recently I was working on a little app for myself to help me keep track of some information on my computer. The app is powered by an NSMetadataQuery (essentially a Spotlight search) that reports back everything it finds.

One of the interesting things about NSMetadataQuery is that after it has done its initial “gathering” of the results, all further updates are reported as a single array of results. Did something get added to the results? Here’s a new array of the state of all results now. Did something get removed? Here’s another array.

This behavior is nice if you’re just throwing the information into a tableView and then calling reloadData(), but it would be really nice if you could compare the two arrays of results (the “before” and “after” arrays) and then perform the necessary insert, remove, move, and reload calls. Not only would this be more efficient (since you don’t have to re-build every cell in the table), but you also get to have control over the animations.

Some naïve approaches would be start going through the two arrays and find out what has been inserted, what has been deleted, and so on. And while this approach will work, it will also result in a whole bunch of extraneous calls. An item may be in both arrays, but you might think it has moved because it’s at a different position in the new array, so you generate a “move” call. However, with this approach it’s really hard to detect that it has only moved because something was inserted before it, which means the move call is ultimately redundant.

We can do better.

Edit Distances

There’s a pretty well-known algorithm out there for determining how similar (or dissimilar) two strings are, called the Levenshtein algorithm. In a nutshell, it tells you how many steps you need to take in order to transform one word into another word. So if you have the word “sword”, it takes two steps to turn it into “words”: one step to delete the “s” from the front, and another step to append the “s” to the end.

Implementing the Levenshtein algorithm is fairly straight-forward for anyone who can readily translate pseudocode into your language of choice. The algorithm is on the Wikipedia page.

Looking at the implementation of the Levenshtein algorithm reveals a way that we can potentially use this: A “string” (from the algorithm’s point-of-view) is nothing more than an indexable collection of characters, and the only processing that happens on the characters themselves is an equality check.

In other words, the Levenshtein algorithm can easily be generalized to work on any CollectionType of Equatables. Suddenly, it looks like just the thing we need to implement more-efficient array diffing.

One downside of the Levenshtein algorithm is that it only returns an Int. It only tells us how many steps we would need, but not what the actual steps are. For that, we’ll turn to a specific implementation of the Levenshtein algorithm, called the Wagner-Fischer algorithm.

Fundamentally, W-F is exactly the same as Levenshtein. The only difference is that instead of being recursive, it uses a doubly-nested for loop, and memoizes previous computations into an n by m matrix. Levenshtein simply re-computes the information it needs. Put in CS terms, we sacrifice the storage efficiency of L (which has no storage/caching) for the computational efficiency of W-F (cache information → fewer computations). I prefer the W-F approach. Memory is pretty cheap, and unless we’re dealing with absurdly large collections, this probably won’t be an issue (but it’s something to be aware of).

However, again we’re stuck with the obvious limitation that we’re still only computing the edit distance (the number of edit steps), and not the steps themselves. However, there’s a key realization here: the number returned by W-F is the number of steps required to transform one collection into another. If we had the actual steps themselves, then steps.count would necessarily be equal to what W-F would typically return. So if we could modify W-F to compute the list of steps instead of the count of steps, then the final array’s .count would still be equal to the result of the original algorithm!

In other words, we want to have a matrix that holds arrays of steps, instead of just integers, and we’re guaranteed to get the same result.

Picking Apart the Algorithm

At first glance, it can be a bit difficult to conceptualize how W-F and L actually work. To understand it, let’s take it apart.

First, W-F and L work on the premise of three basic character transformations:

  1. deletion
  2. insertion
  3. substitution

Each of these operations can be easily understood:

  • transforming “A” into “” (an empty string) requires 1 deletion
  • transforming “AB” into “” requires 2 deletions
  • transforming a string of n characters into the empty string requires n deletions

We can visualize these rules by laying them out in tables:

String Steps to empty string
“” 0
A 1
B 2
C 3
D 4
  • transforming an empty string into “a” requires 1 insertion
  • transforming an empty string into “ab” requires 2 insertions
  • transforming an empty string into a string of n characters requires n insertions

This can also be visualized in a table:

String “” a b c d
Steps from empty string 0 1 2 3 4
  • transforming “A” into “a” requires 1 substitution
  • transforming “AB” into “ab” requires 2 substitutions
  • etc.

This is where things get interesting. We visualize this by combining the previous two tables into a matrix:

“” a b c d
“”
A
B
C
D

and filling in what we know so far:

“” a b c d
“” 0 1 2 3 4
A 1
B 2
C 3
D 4

Here, let’s stop a minute and examine the matrix to look for some patterns. Running vertically down the left is the word we are starting with. Running horizontally across the top is the word we’re going to. Moving vertically from A to "" is a deletion (because we have to delete characters to get back to the empty string). So we can infer that whenever we move vertically across the matrix, we are deleting characters. Similarly, moving horizontally is an insertion. The only other direction to move is diagonally down from the top left, which is a substitution.

Armed with this, we can now deduce an algorithm. When figuring out what goes in position (i, j) in the matrix, we need to check three different values: the one immediately to the left, the one immediately above, and the one diagonally to the upper left. (We’re filling the matrix from the top-left corner towards the bottom-right corner, so those are the only positions that will be filled)

If the old value and the new value are the same, then we aren’t going to be changing anything, and we want to take the value from the diagonal position ((i-1, j-1)) and use that for (i, j).

If the values are different, then we want to minimize the changes to get to this position. So we’ll find the smallest value from the three possible values; that represents the “shortest path” to get to the previous position. Once we’ve found that, we add “1” to that value to arrive at the current position.

Then rinse and repeat until the matrix is filled. The value in the final bottom-right position is the minimal edit distance.

Edit Steps

That’s how the algorithm works, but like I said, we don’t want the edit distance. We want the edit steps.

Let’s go back to that matrix:

“” a b c d
“” 0 1 2 3 4
A 1
B 2
C 3
D 4

Do you remember what those numbers mean? This will help:

“” a b c d
“” 0 1 insertion 2 insertions 3 insertions 4 insertions
A 1 deletion
B 2 deletions
C 3 deletions
D 4 deletions

Remember: moving horizontally is an insertion. Moving vertically is a deletion. Moving diagonally is a substitution. So what goes in that (A, a) position? First, look at the three squares around it and find the one with the smallest number of edits. It’s the diagonal one, right? So since we’re going to move diagonally, we’re going to append a substitution:

“” a b c d
“” 0 1 insertion 2 insertions 3 insertions 4 insertions
A 1 deletion 1 substitution
B 2 deletions
C 3 deletions
D 4 deletions

We should be able to fill out the rest of the table now (d = deletion, i = insertion, s = substitution):

“” a b c d
“” 0 1i 2i 3i 4i
A 1d 1s 1i, 1s 2i, 1s 3i, 1s
B 2d 1d, 1s 2s 1i, 2s 2i, 2s
C 3d 2d, 1s 1d, 2s 3s 1i, 3s
D 4d 3d, 1s 2d, 2s 1d, 3s 4s

In the final position, we see that the optimal edit steps are 4 substitutions. And if we were also keeping track of what the substitutions actually are, then we could very nicely print out a list of steps like this:

ABCD -> abcd:
    at index 0 replace with a
    at index 1 replace with b
    at index 2 replace with c
    at index 3 replace with d

So, that’s the basics of the algorithm. But we’re not done yet. When it comes to reloading tableViews, we also care about moves.

Fortunately, this is trivial to add to this algorithm. When the algorithm finishes, we know that we’ve already got the fewest number of steps needed to get from one string to the other. So all we need to do is look through the array of steps to see if we ever delete and insert the same value. If we do, remove both steps and replace it with a single move.

That allows us to take something like:

abcdefgh -> agbcdefh:
    insert g at index 1
    delete g at index 6

And turn it into:

abcdefgh -> agbcdefh:
    move g from index 6 to 1

We’re reducing our already-minimal steps, so we know that this is just as efficient as the earlier result.

So, that’s the Wagner-Fischer algorithm, modified to report actual edit steps and not just the edit distance. And like I said earlier, the only comparison you’re doing on these two collections is checking for equality, so it is a very simple process to genericize this to work on any two CollectionTypes of Equatables.

The Next Step

Write the code! I’ve laid out everything you need, and the Wikipedia articles also have very helpful pseudocode.

The Next Next Step

Here’s a fun follow-up to try on your own:

Sometimes, the values you’re modeling have properties that can change over time, but that do not fundamentally affect the equality of the object. A naïve example would be a Tweet model value having an Int property for to represent the number of retweets, and a second Int to represent the number of favorites likes.1 The Int values can change, but the tweet is fundamentally the same tweet.

How would you modify the algorithm to still return the same number of insertions, deletions, and substitutions (and maybe moves), but also tell you which values just need to be reloaded?


1 Yes, I know you wouldn’t actually want to write an app this way, but it’s an easy example to get the point across.