Builder with typestate in Rust

The problem

In the previous article I've covered the builder pattern. Here is the code snippet, that implements UserBuilder for User structure:

struct User {
    id: String,
    email: String,
    first_name: Option<String>,
    last_name: Option<String>,
}

struct UserBuilder {
    id: String,
    email: String,
    first_name: Option<String>,
    last_name: Option<String>,
}

impl UserBuilder {
    fn new(id: impl Into<String>, email: impl Into<String>) -> Self {
        Self {
            id: id.into(),
            email: email.into(),
            first_name: None,
            last_name: None,
        }
    }

    fn first_name(mut self, first_name: impl Into<String>) -> Self {
        self.first_name = Some(first_name.into());
        self
    }

    fn last_name(mut self, last_name: impl Into<String>) -> Self {
        self.last_name = Some(last_name.into());
        self
    }

    fn build(self) -> User {
        let Self { id, email, first_name, last_name } = self;
        User { id, email, first_name, last_name }
    }
}

impl User {
    fn builder(id: impl Into<String>, email: impl Into<String>) -> UserBuilder {
        UserBuilder::new(id, email)
    }
}

Which is expected to be used in this way:

let greyblake = User::builder("13", "example@example.com")
    .first_name("Sergey")
    .build();

Notice, that id and email fields are mandatory and do not have any defaults, so they're forced to be passed to User::builder() function. Unfortunately, this breaks the elegance of builder, because names of mandatory fields are not explicitly bound to their values and it is easy to screw up by passing arguments in the wrong order if they're of the same type, e.g:

User::builder("example@example.com", "13")

Would it not be awesome if we could set all values in the same fashion?

let greyblake = User::builder()
    .id("13")
    .email("greyblake@example.com")
    .first_name("Sergey")
    .build();

But at the same time, we'd like to keep the API type-safe, so in case if the builder is misused to construct an invalid user we want to see a compile error. For example, the following usage should not be allowed, because the id field is missing:

let greyblake = User::builder()
    .email("greyblake@example.com")
    .first_name("Sergey")
    .build();

Can we do something like that? Yes! 🦄

NOTE: The problem can be also solved with newtypes, and generally using newtypes is a very good idea, but today we'll stay focused on the builder.

Naive approach

If we think about it for a while, our builder can be in one of the following 4 states:

We can introduce 4 builder types to represent each builder state respectively:

This would be some sort of state machine with the following flow:

UserBuilder state machine

Turn this into code (see in playground) and it works as intended. The following snippet compiles:

let greyblake = User::builder()
    .id("13")
    .email("greyblake@example.com")
    .first_name("Sergey")
    .build();

While this one fails:

let greyblake = User::builder()
    .email("greyblake@example.com")   // <-- id is not specified
    .first_name("Sergey")
    .build();

Error message:

    .build();
     ^^^^^ method not found in `UserBuilderWithEmail`

UserBuilderWithEmail should be turned into UserBuilderComplete first by setting id with .id() and only after that a user can be built.

Although it works, this approach is not very good. First, there is a lot of boilerplate and duplication: first_name() and last_name() had to be implemented 4 times for every single variant of the builder. Second, it does not scale: the boilerplate will grow exponentially if we decide to add new mandatory fields.

Generic builder

To eliminate the duplication we're going to make the builder generic. In particular, we're going to use a technique called typestate. Let me quote Cliffle here:

Typestates are a technique for moving properties of state (the dynamic information a program is processing) into the type level (the static world that the compiler can check ahead-of-time).
The special case of typestates that interests us here is the way they can enforce run-time order of operations at compile-time.

That's it. We want id() and email() to be called before build() can be called.

Let's redefine our builder to be generic over I and E.

struct UserBuilder<I, E> {
    id: I,
    email: E,
    first_name: Option<String>,
    last_name: Option<String>,
}

I and E are type placeholders that will represent the state of id and email fields accordingly. Field id can either be set as a string or be missing. The same applies to email. Let's define simple types to reflect this:

// types for `id`
struct Id(String);
struct NoId;

// types for `email`
struct Email(String);
struct NoEmail;

So actually, what we want to do is to define a similar state machine as before, but now using generics:

UserBuilder state machine

When User::builder() is called, neither id nor email is provided yet, so a value of type UserBuilder<NoId, NoEmail> should be returned.

impl User {
    fn builder() -> UserBuilder<NoId, NoEmail> {
        UserBuilder::new()
    }
}

impl UserBuilder<NoId, NoEmail> {
    fn new() -> Self {
        Self {
            id: NoId,
            email: NoEmail,
            first_name: None,
            last_name: None,
        }
    }
}

When .id() is invoked, regardless of what email is, we just set the id field and preserve the email's type and value without any changes:

//                   +-------- Pay attention ----------+
//                   |                                 |
//                   v                                 |
impl<E> UserBuilder<NoId, E> {  //                     v
    fn id(self, id: impl Into<String>) -> UserBuilder<Id, E> {
        let Self { email, first_name, last_name, .. } = self;
        UserBuilder {
            id: Id(id.into()),
            email,
            first_name,
            last_name
        }
    }
}

Thanks to generics this implementation enables 2 potential transitions:

Symmetrically we define .email():

impl<I> UserBuilder<I, NoEmail> {
    fn email(self, email: impl Into<String>) -> UserBuilder<I, Email> {
        let Self { id, first_name, last_name, .. } = self;
        UserBuilder {
            id,
            email: Email(email.into()),
            first_name,
            last_name
        }
    }
}

We also have to define .first_name() and .last_name() for all 4 possible variants, so we just use generics:

impl<I, E> UserBuilder<I, E> {
    fn first_name(mut self, first_name: impl Into<String>) -> Self {
        self.first_name = Some(first_name.into());
        self
    }

    fn last_name(mut self, last_name: impl Into<String>) -> Self {
        self.last_name = Some(last_name.into());
        self
    }
}

Finally, what remains is to define .build(), and of course, we want to have it only for type UserBuilder<Id, Email>, when both id and email are set:

impl UserBuilder<Id, Email> {
    fn build(self) -> User {
        let Self { id, email, first_name, last_name } = self;
        User {
            id: id.0,
            email: email.0,
            first_name,
            last_name,
        }
    }
}

Let's test it out. The following snippet compiles as expected:

let greyblake = User::builder()
    .id("13")
    .email("greyblake@example.com")
    .first_name("Sergey")
    .build();

While this one does not:

let greyblake = User::builder()
    .id("13")      // <-- email is missing
    .first_name("Sergey")
    .build();

Error:

15 | struct UserBuilder<I, E> {
   | ------------------------ method `build` not found for this
...
93 |         .build();
   |          ^^^^^ method not found in `UserBuilder<Id, NoEmail>`
   |
   = note: the method was found for
           - `UserBuilder<Id, Email>`

Assuming that descriptive type names (NoEmail and Email) are used, the produced error message must be sufficient to understand the error cause and help to figure out how to fix it (for this example an email value needs to be set).

See the complete code in the Rust playground.

Now we are done. And yes, this approach is also not without cons:

So make your trade-offs wisely.

Typed builder

In practice you probably should consider using typed-builder crate instead of crafting the builder manually:

use typed_builder::TypedBuilder;

#[derive(Debug, TypedBuilder)]
struct User {
    id: String,
    email: String,
    #[builder(default)]
    first_name: Option<String>,
    #[builder(default)]
    last_name: Option<String>,
}

fn main() {
    let greyblake = User::builder()
        .id("13".into())
        .email("greyblake@example.com".into())
        .first_name(Some("Sergey".into()))
        .build();
    dbg!(greyblake);
}

Thanks to sasik520 for pointing to this crate on reddit.

Summary

In this article, it was shown how typestate can be used together with builder pattern to enforce correct usage of the second one.

Some of the common use cases for typestate:

The inspiration for this article was taken from the following resources which I recommend you to check out:

Discussion on Reddit

P.S.

Originally I was planning to write an article about Phantom Builder pattern. In essence, phantom builder is just a very specific case of builder and typestate when state field is never used at runtime (hence the name).