The Laws of Core Data

In my conversations with developers, I’ve heard a pretty common theme from them that “Core Data is hard” or “Core Data is buggy” or “I could never get it to work right and gave up on it”.

I’ve spent a lot of time using Core Data and thought I’d share my “Laws of Core Data”. These are a set of rules I’ve developed over time on how to use Core Data in such a way that it is almost entirely painless. When I follow these rules, I almost never have any problems using it.

The Laws

  1. Do not use Core Data as if it were a database

    It’s common to hear developers talk about and treat Core Data as if it were a database. They see that it’s powered by SQLite, and think it’s functionally equivalent.

    It is not.

    Core Data is an “object graph and persistence framework”, which is basically like a fancy kind of object-relational mapping. That means it is a whole bunch of code to help you maintain a graph (ie, a “network” of related pieces of data with a defined organization) of objects and then persist them in some fashion.

    It does not necessarily mean you have tables with rows of data. It does not necessarily mean that you have the ability to join across data types. It does not necessarily mean that it’s even stored as a file on your disk.

    Some things that Core Data can do beyond most databases:

    • make sure bidirectional relationships are properly hooked up
    • use custom data validation rules
    • use custom data migration logic
    • store specific attributes outside the primary store location
    • serialize custom attribute types as Data
    • index content using Spotlight
    • automatic schema and data migration

    Having these abilities means you can have Core Data take care of a lot more logic for you than if you were using a traditional database.

  2. Do not use Core Data as if it were a SQLite wrapper

    This is very much related to the first law, but is a bit more specific, and it has to do with how Core Data persists data. It is exceptionally rare to find a Core Data implementation that does not use SQLite as the persistence layer, but it does happen.

    Out-of-the-box, Core Data natively supports 4 different ways to “persist” data:

    • As a SQLite file
    • As an XML file
    • As a binary file
    • As an in-memory representation

    In addition to these, Core Data also allows you to create your own persistence mechanism, by subclassing either NSAtomicStore or NSIncrementalStore. So, if you wanted, you could make Core Data save things to a git repository, or to CloudKit, or to MySQL or PostgreSQL, or to your own custom backend… Several years ago I created a framework to access the stackoverflow.com API, and networking was done via a custom Core Data store that translated Core Data requests in to API calls. It was weird, but it worked.

    Core Data does not have to be just SQLite. In fact, modeling your schema as if it were SQLite (or some other RDBMS variant) is a sure sign you’re “doing it wrong”. Setting up custom things like artificial foreign keys or join tables are almost never necessary and are almost always wrong.

  3. Your NSManagedObjectContext is your “stack”

    Typically one of the first things that developers do when creating a Core Data stack is to create a “DataStack” object that encapsulates loading up the model, creating the store coordinator, and then creating the main NSManagedObjectContext. That “stack” object then gets passed around as your “Core Data manager” object by which you get the context you need. iOS 10.0 and macOS 10.12 added the concept of an NSPersistentContainer, which does a lot of this for you.

    Having a single object to load up your model and everything is great. But you don’t need to pass it around.

    It’s usually passed around in order to have easy access to making a new context or accessing the model. That is all unnecessary. If you do decide to pass Core Data objects around your app, then all you need is the NSManagedObjectContext (“MOC”).

    Your MOC has an NSPersistentStoreCoordinator (“PSC”) property, which itself has an NSManagedObjectModel (“MOM”, aka the schema). So from a single MOC, you can get any information you need about your schema, where things are being saved, what format they’re being saved in, the configuration for the persistent stores, etc.

    If you decide you need to create a new, one-off MOC, it’s easy to do so with your existing MOC:

     let existingContext: NSManagedObjectContext = ...
     let newContext = NSManagedObjectContext(concurrencyType: .privateQueueConcurrencyType)
     newContext.persistentStoreCoordinator = existingContext.persistentStoreCoordinator
     // that's it
    

    You don’t need to pass around a “stack” object.

    (Creating new contexts like this isn’t ideal, because of another law further down)

  4. Never ever ever ever ever use an NSManagedObject outside its context’s queue

    This law is the source of bugs when it comes to Core Data. Offhand I’d guess that more than 90% of the pain developers experience with Core Data is because of this.

    Core Data tries to be efficient; it typically doesn’t like to load up more data than you need, which means there are times when you ask it for data (like an object property) and it doesn’t have it handy. When this happens, it has to go load the data from its store (which might not even be a local file on disk!) before it can respond to you.

    This is called “faulting”. The marker value internal kept by a managed object is a “fault”, and the process of “fulfilling” (ie, retrieving the data) the fault is “faulting”.

    Here’s the thing: Core Data has to be safe. It has to synchronize these faulting calls with other accesses of the persistent store, and it has to do it in a way that isn’t going to interfere with other calls to fault in data. The way it does that is by expecting that all calls to fault in data happen safely inside one of its queues.

    Every managed object “belongs” to a particular MOC (more on this in a minute), and every MOC has a DispatchQueue that it uses to synchronize its internal logic about loading data from its persistentStoreCoordinator.

    If you use an NSManagedObject from outside the MOC’s queue, then the calls to fault in data are not properly synchronized and protected, which means you’re susceptible to race conditions.

    So, if you have an NSManagedObject, the only safe place to use it is from inside a call to perform or performAndWait on its MOC, like so:

     let object: NSManagedObject = ...
     var propertyValue: PropertyType!
     object.managedObjectContext.performAndWait {
         propertyValue = object.property
     }
     ... 
    

    Using your own DispatchQueue or one of the global queues is insufficient. The managed object has to be accessed from the queue that is controlled by the MOC, and the way to do that is with the perform and performAndWait methods.

    There is one special case to this, and that is dealing with managed objects that belong to a MOC whose queue is the “main” queue. The DispatchQueue.main queue is bound to the main thread of your app, and so if you’re on the main thread and have a main-thread-object, you can “safely” not use perform calls because you are already inside the context’s queue.

    The only managed object property that is safe to use outside of a queue or pass between queues/threads is the object’s objectID: this is a Core Data-provided identifier unique for that particular object. You can access this property from anywhere, and it is the only way to “transfer” a managed object from one context to another:

     let objectInContextA: NSManagedObject = ...
     let objectID = objectInContextA.objectID
     let contextB: NSManagedObjectContext = ...
     contextB.perform {
         let objectInContextB = contextB.object(with: objectID)
         // objectInContextB is now a separate *instance* from the original object,
         // but both are backed by the same data in the persistent store
     }
    

    I will add here that it is really unfortunate we have to care about this. It’s not hard to imagine a world where managed objects deal with this sort of stuff automatically. However, this is what happens when we’re dealing with a framework that is over 14 years old and is based on another framework (EOF) that is 24 years old. The problem of “binary compatibility” is a blog post for another day.

  5. Do not use NSManagedObject as if it were an NSObject

    This is a generalization of the previous law. Because of the weirdness around faulting and queue access, it’s my opinion that NSManagedObject shouldn’t actually be a subclass of NSObject. When we see NSObject in our code, we have assumptions about how they work with regards to memory management, multi-threaded access, and behavior. NSManagedObject breaks enough of these rules that it probably shouldn’t be an NSObject, but should be its own root class.

    So, forget that it’s an NSObject. It doesn’t really behave like one, and you shouldn’t use it as if it were.

  6. You usually don’t need parent-child contexts

    One of the more esoteric features of Core Data is the ability to have relationships between contexts: you can have a MOC that is not actually backed by the NSPersistentStoreCoordinator, but is instead backed by another MOC. This has some really interesting implications, but in general: you don’t need this.

    The ability to have a child MOC is neat in some corner cases. Let’s review the core functionality of MOCs in order to understand those cases:

    • MOCs load objects from its source (a PSC or another MOC)
    • MOCs save objects to its source (a PSC or another MOC)
    • MOCs enforce graph integrity when they save

    That’s really the core pieces. So, you would want a child MOC if:

    • You only want to load objects that are already loaded in another MOC
    • You want to save objects to another MOC, but not necessarily the PSC
    • You want to enforce graph integrity without persisting the objects

    As you can see, when you deal with child contexts, you’re really dealing with transient (non-persisted) objects. You’re fundamentally changing loading and saving behavior.

    The times when you need this are pretty rare. You would typically want this for something like a complex sub-graph creation flow, where along each step of the flow, you need to enforce relationship integrity, but don’t want to actually save it to the persistent store until the flow is complete. And if the flow is cancelled, you don’t want any of it to be saved at all. You could do that by having a child context, doing all the flow steps in the child context, and saving it up to a parent context, but you can still delete the child context if the user aborts.

    They’re kind of like transactions in normal database systems. You can start importing or editing a bunch of data, and if something goes wrong or is cancelled, you can roll back the changes.

    Parent/child contexts are usually advocated for something like “load some data in the background, and the saving it pushes it to the main queue context”. That can work, but it does mean that in order to persist your data, you actually have to save two contexts, instead of just one (because save()-ing a context only pushes the data up one level. For a child context, the data only goes to the parent context, not all the way up to the PSC). In my opinion, using a child context like this is unnecessarily complicated.

    For general, non-transactional usage, I think it’s better to have two contexts (one for the main thread, one for the background) that both link directly to the PSC. Importation of data is done on the background context, and when it saves, the main queue listens for the NSManagedObjectContextDidSave notification and merges in the changes with .mergeChanges(fromContextDidSave:) method to update its internally-held objects. Even that step might be unnecessary if the context has automaticallyMergesChangesFromParent set to true.

  7. Keep your main queue context read-only

    If you’re building an app that is reading information out of Core Data, displaying it to the user, and allowing minimal edits, then in my experience it’s best to keep the main queue context a “read-only” context.

    By having strict rules around which contexts are readable vs writable, it makes it much easier to reason about when parts of your UI should be reloaded: commands to update the UI come from a single direction (from your model towards your UI). If you allow mutation of stored information, then that can be encapsulated as a sort of “request for mutation”, sent off to the controller for this part of your model, and executed there. Performing the mutation on a Core Data object directly makes it harder to debug where changes are coming from (the data import step? editing in the UI? something else?), because you have a single point of entry.

    If you follow the next law as well, then this law becomes very simple to enforce.

  8. Use an abstraction layer

    This is more along the lines of “general good advice” than anything specific to Core Data, but here it is:

    It’s generally a smart thing to hide the fact that you’re using Core Data from the rest of your app. This isn’t because you’re “ashamed” of it and need to obscure it (😉), but is more because of the fact that Core Data objects carry a decent amount of baggage with them that the rest of your app shouldn’t have to know about it (see earlier point about how objects bring along the entire stack).

    When you pass managed objects or contexts around your app, the temptation to just reach inside an object and pull out the PSC or the MOM or whatever and use it becomes too high. Don’t do that. Avoid violating the Law of Demeter and have a proper controller object that you can ask for what you need.

    You could hide a managed object behind a protocol, but that also makes it easy to forget the law about queue usage.

    In my opinion, you should keep the details of graph integrity and persistence to a confined part of your app, and data should only get out via custom-purpose struct values (or something like them).

    As a rudimentary example of what this might look like, you could do something like this:

     protocol ManagedObjectInitializable {
         init(managedObject: NSManagedObject)
     }
        
     class ModelController {
         func fetchObjects<T>(completion: @escaping (Array<T>) -> Void) 
         	  where T: ManagedObjectInitializable { 
             ... 
         }
     }
        
     struct Person: ManagedObjectInitializable {
         let firstName: String
         let lastName: String
         ...
     }
    

    There are many different ways you could abstract out the details of Core Data, each with their pros and cons. But hiding Core Data like this from the rest of your app is a huge step along the road to proper encapsulation and “need-to-know” information hiding.

  9. Use Core Data as a local cache

    These days it’s pretty common to use apps that sync data between devices or are powered by a server backend. Very rarely do you find apps that produce and consume data that is only local to the device.

    Because of this, I find that it’s nice to use Core Data as a local cache. With Core Data hidden behind an abstraction layer, it’s easily integrated into a model layer from which I request data. The model layer looks in Core Data, and if the data is there, returns that. If it’s not, the data is fetched, saved in to Core Data, and then returned.

    Using Core Data in this manner means that if I ever have schema conflicts (ie, I’ve updated my app with a new schema version, and the persisted data from the old version is no longer compatible with the new version), I don’t really have any qualms about just nuking the entire persistent store and starting over. Of course, I can go through process of performing a migration and dealing with manually shuffling data around to be in the new format, but I don’t have to. That’s huge and saves me a ton of work.

    There is a big “gotcha” with this approach though: Core Data works best when you have the entire data set to query against. Because Core Data cares a lot about validation and graph integrity, it doesn’t work as well as a cache if parts of your data are missing. You can account for that in your schema, but that can also complicate how you use it. So if you’re going to use Core Data as a local cache, it’s best if you can use it against a whole part of your data set.

So, those are my “laws” for using Core Data. When I follow these laws, I almost never have any problems with race conditions, corrupted data, “sadness like the death of optimism”, or data integrity. It all just works, and tends to work really, really well.

I hope you’ll give it a shot.


A special thanks to Cole Joplin, Tom Harrington, and Soroush Khanlou for proof-reading and providing feedback.


Related️️ Posts️

A Better MVC, Part 5: An Evolution
Swift Protocols Wishlist
Simplifying Swift framework development
Reading your own entitlements
A Better MVC, Part 4: Future Directions
A Better MVC, Part 3: Fixing Massive View Controller
A Better MVC, Part 2: Fixing Encapsulation
A Better MVC, Part 1: The Problems