Protect mutable state with Swift actors

submited by
Style Pass
2021-06-09 14:30:04

Data races occur when two separate threads concurrently access the same mutable state. They are trivial to construct, but are notoriously hard to debug. Discover how you can stop these data races in their tracks with Swift actors, which help synchronize access to data in your code. Discover how actors work and how to share values between them. Learn about how actor isolation affects protocol conformances. And finally, meet the main actor, a new way of ensuring that your code always runs on the main thread when needed. To get the most out of this session, we recommend first watching “Meet async/await in Swift.”

♪ Bass music playing ♪ ♪ Dario Rexin: Hi, my name is Dario Rexin, and I am an engineer on the Swift team here at Apple.

Today, my colleague Doug and I will talk about actors in Swift and how they are utilized to protect mutable state in concurrent Swift applications.

One of the fundamentally hard problems with writing concurrent programs is avoiding data races.

Data races occur when two separate threads concurrently access the same data and at least one of those accesses is a write.

Data races are trivial to construct but are notoriously hard to debug.

Here's a simple counter class with one operation that increments the counter and returns its new value.

Let's say we go ahead and try to increment from two concurrent tasks.

This is a bad idea.

Depending on the timing of the execution, we might get 1 and then 2, or 2 and then 1.

This is expected, and in both cases, the counter would be left in a consistent state.

But because we've introduced a data race, we could also get 1 and 1 if both tasks read a 0 and write a 1.

Or even 2 and 2 if the return statements happen after both increment operations.

Data races are notoriously hard to avoid and debug.

They require nonlocal reasoning because the data accesses causing the race might be in different parts of the program.

And they are nondeterministic because the operating system's scheduler might interleave the concurrent tasks in different ways each time you run your program.

Data races are caused by shared mutable state.

If your data doesn't change or it isn't shared across multiple concurrent tasks, you can't have a data race on it.

One way to avoid data races is to eliminate shared mutable state by using value semantics.

With a variable of a value type, all mutation is local.

Moreover, "let" properties of value-semantic types are truly immutable, so it's safe to access them from different concurrent tasks.

Swift has been promoting value semantics since its inception because they make it easier to reason about our program and those same things also make them safe to use in concurrent programs.

In this example, we create an array with some values.

Next, we assign that array to a second variable.

Now we append a different value to each copy of the array.

When we print both arrays at the end, we see that both copies contain the values that the array was initialized with, but each appended value is only present in the respective copy we appended them to.

The majority of types in Swift's standard library have value semantics, including collection types like dictionary, or as in this example, array.

Now that we have established that value semantics solve all of our data races, let's go ahead and make our counter a value type by turning it into a struct.

We also have to mark the increment function as mutating, so it can modify the value property.

When we are now trying to modify the counter we will get a compiler error because the counter is a let, which prevents us from mutating it.

Now, it seems very tempting to just change the counter variable to a var to make it mutable.

But that would leave us, again, with a race condition because the counter would be referenced by both concurrent tasks.

Luckily, the compiler has us covered and does not allow us to compile this unsafe code.

We can instead assign the counter to a local mutable variable inside each concurrent task.

When we execute our example now, it will always print 1 for both concurrent tasks.

But even though our code is now race-free, the behavior is not what we want anymore.

This goes to show that there are still cases where shared mutable state is required.

When we have shared mutable state in a concurrent program, we need some form of synchronization to ensure that concurrent use of our shared mutable state won't cause data races.

There are a number of primitives for synchronization, from low-level tools like atomics and locks to higher-level constructs like serial dispatch queues.

Each of these primitives has its various strengths, but they all share the same critical weakness: they require careful discipline to use exactly correctly, every single time, or we'll end up with a data race.

This is where actors come in.

Actors are a synchronization mechanism for shared mutable state.

An actor has its own state and that state is isolated from the rest of the program.

The only way to access that state is by going through the actor.

And whenever you go through the actor, the actor's synchronization mechanism ensures that no other code is concurrently accessing the actor's state.

This gives us the same mutual exclusion property that we get from manually using locks or serial dispatch queues, but with actors, it is a fundamental guarantee provided by Swift.

You can't forget to perform the synchronization, because Swift will produce a compiler error if you try.

Actors are a new kind of type in Swift.

They provide the same capabilities as all of the named types in Swift.

They can have properties, methods, initializers, subscripts, and so on.

They can conform to protocols and be augmented with extensions.

Like classes, they are reference types; because the purpose of actors is to express shared mutable state.

In fact, the primary distinguishing characteristic of actor types is that they isolate their instance data from the rest of the program and ensure synchronized access to that data.

All of their special behavior follows from those core ideas.

Here, we've defined our counter as an actor type.

We still have the instance property value for the counter, and the increment method to increment that value and return the new value.

The difference is that the actor will ensure the value isn't accessed concurrently.

In this case, that means the increment method, when called, will run to completion without any other code executing on the actor.

That guarantee eliminates the potential for data races on the actor's state.

Let's bring back our data race example.

We again have two concurrent tasks attempting to increment the same counter.

The actor's internal synchronization mechanism ensures that one increment call executes to completion before the other can start.

So we can get 1 and 2 or 2 and 1 because both are valid concurrent executions, but we cannot get the same count twice or skip any values because the internal synchronization of the actor has eliminated the potential for data races on the actor state.

Let's consider what actually happens when both concurrent tasks try to increment the counter at the same time.

One will get there first, and the other will have to wait its turn.

But how can we ensure that the second task can patiently await its turn on the actor? Swift has a mechanism for that.

Whenever you interact with an actor from the outside, you do so asynchronously.

If the actor is busy, then your code will suspend so that the CPU you're running on can do other useful work.

When the actor becomes free again, it will wake up your code -- resuming execution -- so the call can run on the actor.

The await keyword in this example indicates that the asynchronous call to the actor might involve such a suspension.

Let's stretch our counterexample just a bit further by adding an unnecessarily slow reset operation.

This operation sets the value back to 0, then calls increment an appropriate number of times to get the counter to the new value.

This resetSlowly method is defined in an extension of the counter actor type so it is inside the actor.

That means it can directly access the actor's state, which it does to reset the counter value to 0.

It can also synchronously call other methods on the actor, such as in the call to increment.

There's no await required because we already know we're running on the actor.

This is an important property of actors.

Synchronous code on the actor always runs to completion without being interrupted.

So we can reason about synchronous code sequentially, without needing to consider the effects of concurrency on our actor state.

We have stressed that our synchronous code runs uninterrupted, but actors often interact with each other or with other asynchronous code in the system.

Let's take a few minutes to talk about asynchronous code and actors.

But first, we need a better example.

Here we are building an image downloader actor.

It is responsible for downloading an image from another service.

It also stores downloaded images in a cache to avoid downloading the same image multiple times.

The logical flow is straightforward: check the cache, download the image, then record the image in the cache before returning.

Because we are in an actor, this code is free from low-level data races; any number of images can be downloaded concurrently.

The actor's synchronization mechanisms guarantee that only one task can execute code that accesses the cache instance property at a time, so there is no way that the cache can be corrupted.

That said, the await keyword here is communicating something very important.

Whenever an await occurs, it means that the function can be suspended at this point.

It gives up its CPU so other code in the program can execute, which affects the overall program state.

At the point where your function resumes, the overall program state will have changed.

It is important to ensure that you haven't made assumptions about that state prior to the await that may not hold after the await.

Imagine we have two different concurrent tasks trying to fetch the same image at the same time.

The first sees that there is no cache entry, proceeds to start downloading the image from the server, and then gets suspended because the download will take a while.

While the first task is downloading the image, a new image might be deployed to the server under the same URL.

Now, a second concurrent task tries to fetch the image under that URL.

It also sees no cache entry because the first download has not finished yet, then starts a second download of the image.

It also gets suspended while its download completes.

After a while, one of the downloads -- let's assume it's the first -- will complete and its task will resume execution on the actor.

It populates the cache and returns the resulting image of a cat.

Now the second task has its download complete, so it wakes up.

It overwrites the same entry in the cache with the image of the sad cat that it got.

So even though the cache was already populated with an image, we now get a different image for the same URL.

That's a bit of a surprise.

We expected that once we cache an image, we always get that same image back for the same URL so our user interface remains consistent, at least until we go and manually clear out of the cache.

But here, the cached image changed unexpectedly.

We don't have any low-level data races, but because we carried assumptions about state across an await, we ended up with a potential bug.

The fix here is to check our assumptions after the await.

If there's already an entry in the cache when we resume, we keep that original version and throw away the new one.

A better solution would be to avoid redundant downloads entirely.

We've put that solution in the code associated with this video.

Actor reentrancy prevents deadlocks and guarantees forward progress, but it requires you to check your assumptions across each await.

To design well for reentrancy, perform mutation of actor state within synchronous code.

Ideally, do it within a synchronous function so all state changes are well-encapsulated.

State changes can involve temporarily putting our actor into an inconsistent state.

Make sure to restore consistency before an await.

And remember that await is a potential suspension point.

If your code gets suspended, the program and world will move on before your code gets resumed.

Any assumptions you've made about global state, clocks, timers, or your actor will need to be checked after the await.

And now my colleague Doug will tell you more about actor isolation. Doug? Doug Gregor: Thanks, Dario.

Actor isolation is fundamental to the behavior of actor types.

Dario discussed how actor isolation is guaranteed by the Swift language model, through asynchronous interactions from outside the actor.

In this section, we'll talk about how actor isolation interacts with other language features, including protocol conformances, closures, and classes.

Like other types, actors can conform to protocols so long as they can satisfy the requirements of the protocol.

For example, let's make this LibraryAccount actor conform to the Equatable protocol.

The static equality method compares two library accounts based on their ID numbers.

Because the method is static, there is no self instance and so it is not isolated to the actor.

Instead, we have two parameters of actor type, and this static method is outside of both of them.

That's OK because the implementation is only accessing immutable state on the actor.

Let's extend our example further to make our library account conform to the Hashable protocol.

Doing so requires implementing the hash(into) operation, which we can do like this.

However, the Swift compiler will complain that this conformance isn't allowed.

What happened? Well, conforming to Hashable this way means that this function could be called from outside the actor, but hash(into) is not async, so there is no way to maintain actor isolation.

To fix this, we can make this method nonisolated.

Nonisolated means that this method is treated as being outside the actor, even though it is, syntactically, described on the actor.

This means that it can satisfy the synchronous requirement from the Hashable protocol.

Because nonisolated methods are treated as being outside the actor, they cannot reference mutable state on the actor.

This method is fine because it's referring to the immutable ID number.

If we were to try to hash based on something else, such as the array of books on loan, we will get an error because access to mutable state from the outside would permit data races.

That's enough of protocol conformances.

Let's talk about closures.

Closures are little functions that are defined within one function, that can then be passed to another function to be called some time later.

Like functions, a closure might be actor-isolated or it might be nonisolated.

In this example, we're going to read some from each book we have on loan and return the total number of pages we've read.

The call to reduce involves a closure that performs the reading.

Note that there is no await in this call to readSome.

That's because this closure, which is formed within the actor-isolated function "read", is itself actor-isolated.

We know this is safe because the reduce operation is going to execute synchronously, and can't escape the closure out to some other thread where it could cause concurrent access.

Now, let's do something a little different.

I don't have time to read just now, so let's read later.

Here, we create a detached task.

A detached task executes the closure concurrently with other work that the actor is doing.

Therefore, the closure cannot be on the actor or we would introduce data races.

So this closure is not isolated to the actor.

When it wants to call the read method, it must do so asynchronously, as indicated by the await.

We've talked a bit about actor isolation of code, which is whether that code runs inside the actor or outside it.

Now, let's talk about actor isolation and data.

In our library account example, we've studiously avoided saying what the book type actually is.

I've been assuming it's a value type, like a struct.

That's a good choice because it means that all the state for an instance of the library account actor is self-contained.

If we go ahead and call this method to select a random book to read, we'll get a copy of the book that we can read.

Changes we make to our copy of the book won't affect the actor and vice versa.

However, if the turn the book into a class, things are a little different.

Our library account actor now references instances of the book class.

That's not a problem in itself.

However, what happens when we call the method to select a random book? Now we have a reference into the mutable state of the actor, which has been shared outside of the actor.

We've created the potential for data races.

Now, if we go and update the title of the book, the modification happens in state that is accessible within the actor.

Because the visit method is not on the actor, this modification could end up being a data race.

Value types and actors are both safe to use concurrently, but classes can still pose problems.

We have a name for types that are safe to use concurrently: Sendable.

A Sendable type is one whose values can be shared across different actors.

If you copy a value from one place to another, and both places can safely modify their own copies of that value without interfering with each other, the type can be Sendable.

Value types are Sendable because each copy is independent, as Dario talked about earlier.

Actor types are Sendable because they synchronize access to their mutable state.

Classes can be Sendable, but only if they are carefully implemented.

For example, if a class and all of its subclasses only hold immutable data, then it can be called Sendable.

Or if the class internally performs synchronization, for example with a lock, to ensure safe concurrent access, it can be Sendable.

But most classes are neither of these, and cannot be Sendable.

Functions aren't necessarily Sendable, so there is a new kind of function type for functions that are safe to pass across actors.

We'll get back to those shortly.

Your actors -- in fact, all of your concurrent code -- should primarily communicate in terms of Sendable types.

Sendable types protect code from data races.

This is a property that Swift will eventually start checking statically.

At that point, it will become an error to pass a non-Sendable type across actor boundaries.

How does one know that a type is Sendable? Well, Sendable is a protocol, and you state that your type conforms to Sendable the same way you do with other protocols.

Swift will then check to make sure your type makes sense as a Sendable type.

A Book struct can be Sendable if all of its stored properties are of Sendable type.

Let's say Author is actually a class, which means it -- and therefore the array of authors -- are not Sendable.

Swift will produce a compiler error indicating that Book cannot be Sendable.

For generic types, whether they are Sendable can depend on their generic arguments.

We can use conditional conformance to propagate Sendable when it's appropriate.

For example, a pair type will be Sendable only when both of its generic arguments are Sendable.

The same approach is used to conclude that an array of Sendable types is itself Sendable.

We encourage you introduce Sendable conformances to the types whose values are safe to share concurrently.

Use those types within your actors.

Then when Swift begins to start enforcing Sendable across actors, your code will be ready.

Functions themselves can be Sendable, meaning that it is safe to pass the function value across actors.

This is particularly important for closures where it restricts what the closure can do to help prevent data races.

For example, a Sendable closure cannot capture a mutable local variable, because that would allow data races on the local variable.

Anything the closure does capture needs to be Sendable, to make sure that the closure cannot be used to move non-Sendable types across actor boundaries.

And finally, a synchronous Sendable closure cannot be actor-isolated, because that would allow code to be run on the actor from the outside.

We've actually be relying on the idea of Sendable closures in this talk.

The operation that creates detached tasks takes a Sendable function, written here with the @Sendable in the function type.

Remember our counterexample from the beginning of the talk? We were trying to build a value-typed counter.

Then, we tried to go and modify it from two different closures at the same time.

This would be a data race on the mutable local variable.

However, because the closure for a detached task is Sendable, Swift will produce an error here.

Sendable function types are used to indicate where concurrent execution can occur, and therefore prevent data races.

Here's another example we saw earlier.

Because the closure for the detached task is Sendable, we know that it should not be isolated to the actor.

Therefore, interactions with it will have to be asynchronous.

Sendable types and closures help maintain actor isolation by checking that mutable state isn't shared across actors, and cannot be modified concurrently.

We've been talking primarily about actor types, and how they interact with protocols, closures, and Sendable types.

There is one more actor to discuss -- a special one that we call the main actor.

When you are building an app, you need to think about the main thread.

It is where the core user interface rendering happens, as well as where user interaction events are processed.

Operations that work with the UI generally need to be performed from the main thread.

However, you don't want to do all of your work on the main thread.

If you do too much work on the main thread, say, because you have some slow input/output operation or blocking interaction with a server, your UI will freeze.

So, you need to be careful to do work on the main thread when it interacts with the UI but get off the main thread quickly for computationally expensive or long-waiting operations.

So, we do work off the main thread when we can and then call DispatchQueue.main.async in your code whenever you have a particular operation that must be executed on the main thread.

Stepping back from the details of the mechanism, the structure of this code looks vaguely familiar.

In fact, interacting with the main thread is a whole lot like interacting with an actor.

If you know you're already running on the main thread, you can safely access and update your UI state.

If you aren't running on the main thread, you need to interact with it asynchronously.

This is exactly how actors work.

There's a special actor to describe the main thread, which we call the main actor.

The main actor is an actor that represents the main thread.

It differs from a normal actor in two important ways.

First, the main actor performs all of its synchronization through the main dispatch queue.

This means that, from a runtime perspective, the main actor is interchangeable with using DispatchQueue.main.

Second, the code and data that needs to be on the main thread is scattered everywhere.

It's in SwiftUI, AppKit, UIKit, and other system frameworks.

It's spread across your own views, view controllers, and the UI-facing parts of your data model.

With Swift concurrency, you can mark a declaration with the main actor attribute to say that it must be executed on the main actor.

We've done that with the checked-out operation here, so it always runs on the main actor.

If you call it from outside the main actor, you need to await, so that the call will be performed asynchronously on the main thread.

By marking code that must run on the main thread as being on the main actor, there is no more guesswork about when to use DispatchQueue.main.

Swift ensures that this code is always executed on the main thread.

Types can be placed on the main actor as well, which makes all of their members and subclasses be on the main actor.

This is useful for the parts of your code base that must interact with the UI, where most everything needs to run on the main thread.

Individual methods can opt-out via the nonisolated keyword, with the same rules you're familiar with from normal actors.

By using the main actor for your UI-facing types and operations, and introducing your own actors for managing other program state, you can architect your app to ensure safe, correct use of concurrency.

In this session, we've talked about how actors protect their mutable state from concurrent access, using actor isolation and by requiring asynchronous access from outside the actor to serialize execution.

Use actors to build safe, concurrent abstractions in your Swift code.

In implementing your actors, and in any asynchronous code, always design for reentrancy; an await in your code means the world can move on and invalidate your assumptions.

Value types and actors work together to eliminate data races.

Be aware of classes that don't handle their own synchronization, and other non-Sendable types that reintroduce shared mutable state.

Finally, use the main actor on your code that interacts with the UI to ensure that the code that must be on the main thread always runs on the main thread.

To learn more about how to use actors within your own application, check out our session on updating an app for Swift concurrency.

And to learn more about the implementation of Swift's concurrency model, including actors, check out our "Behind the scenes" session.

Actors are a core part of the Swift concurrency model.

They work together with async/await and structured concurrency to make it easier to build correct and efficient concurrent programs.

We can't wait to see what you build with them.

Leave a Comment