Rust Knowledge Refinement

Serhii Potapov February 07, 2021 #rust

Recently I reread The Rust Book. The last time I read the book (more precisely the first edition of it) was in 2016. Since that some things got forgotten, others did not exist yet. So to address my knowledge gaps I decided to document some discoveries. This blog post is mostly for me myself, so I can come back here and quickly refresh my memory. But you may discover something new and interesting as well.

Copy and Drop relation

Types can exclusively implement either Copy or Drop traits. But not both.

If I think about it for a while it does make sense: if a type requires a special Drop trait to correctly cleanup resources, a simple memory copy(Copy trait) would lead to a state, where a resource may have multiple owners. (but we know that the ownership rules allow only one owner).

Slices

Type definitions:

&str
&[i32], &[f64], etc.

A slice occupies memory on the stack that equals double usize (16 bytes on x86-64). One usize is to store a memory pointer, another one is for length.

&str slice boundaries

String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.

For example, this code will execute and print russian "П" letter:

let hi = "Привет Мир";
let slice = &hi[0..2]; // first 2 bytes
println!("slice = {}", slice);

But if we replace the second line with let slice = &hi[0..3]; it panics:

thread 'main' panicked at 'byte index 3 is not a char boundary;
it is inside 'р' (bytes 2..4) of `Привет Мир`', src/main.rs:3:1

Implicit &String -> &str conversion

Rust is typically very strict about types, however, this code is valid:

let hi: String = String::from("hi");
let slice: &str = &hi;

&hi with type &String gets assigned to a variable with type &str and that compiles. This is due to an exception called Deref coercion.

Struct update syntax

Rust supports the struct update syntax, similar to the one that JavaScript has. I don't happen to use it often, but it could be very helpful in tests.

let espresso = Product { name: "Espresso".to_string(), price: 2.50 };
let double_espresso = Product { price: 3.50, ..espresso };

Iter::collect()

I always used Iter::collect() method to build a vector, but actually collect() can be used to build many other collections, that implement FromIterator trait. For example that following code produces a hash map of intergers from 1 to 5 and their squares:

let squares: HashMap<i32, i32> = (1..=5).map(|x| (x, x*x)).collect();
println!("{:?}", squares);  // {3: 9, 4: 16, 1: 1, 5: 25, 2: 4}

impl Trait syntax

impl Trait as a function argument

I would typically define a generic function like this:

fn send<T: Message>(message: &T) { ... }

But Rust also has impl Trait syntax:

fn send(message: &impl Message) { ... }

However, I would still prefer the first option, because it better communicates visually the fact that the function is generic.

impl Trait as a return value

impl Trait can also be used to return values when a returned type is too long to write it manually. It's typically used to return futures or iterators.

In the following example function filter_div3() takes an iterator that produces i32 applies an extra filter on it and returns back a new iterator.

fn filter_div3(iter: impl Iterator<Item=i32>) -> impl Iterator<Item=i32> {
    iter.filter(|x| x % 3 == 0)
}

fn main() {
    // Fib is an iterator that produces the fibonacci sequence
    // [3, 21, 144, 987, 6765]
    println!("{:?}", filter_div3(Fib::new()).take(5).collect::<Vec<i32>>());

    // [3, 6, 9, 12, 15]
    println!("{:?}", filter_div3(1..=100).take(5).collect::<Vec<i32>>());
}

This is handy but comes also with restrictions. We're not allowed to use the returned value as anything else, but an iterator.

E.g. this code does not compile:

println!("{:?}", filter_div3(1..=100));

because the compiler has no guarantees, that value returned by filter_div3() implements Debug trait.

UPDATE (based on the reddit comment):

It's is possible use Debug if you change the return type from impl Iterator<Item=i32> to impl Iterator<Item=i32> + Debug.

Blanket implementations

The concept was known to me, but I was not aware of the term itself.

From The Rust Book:

Implementations of a trait on any type that satisfies the trait bounds are called blanket implementations.

Examples:

Every type that implements Display, gets implementation of ToString automatically
If type A implements From<B>, then type B automatically gets implementation of Into<A>.

Lifetime elision rules

The first rule is:

Each parameter that is a reference gets its lifetime parameter.

// E.g. the following function signature
fn foo(x: &u8, &u8)

// Is seen by Rust as
fn foo<'a, 'b>(x: &'a u8, &'b u8)

The second rule is:

If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters.

fn foo(x: &u8) -> &u8
// Equals to
fn foo<'a>(x: &'a u8) -> &'a u8

The third rule is:

If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters.

fn foo(&self, x: &u8) -> &u8
// Equals to
fn foo<'a, 'b>(&'a self, x: &'b u8) -> &'a u8

Tests

Passing options to cargo test

We can pass options directly to cargo test, for example:

cargo test --help

Or we can pass options to a binary that cargo test runs:

cargo test -- --help

I often use

cargo test -- --nocapture

to see an output printed in my tests.

Using Result<T, E> in tests

Test functions can be defined to return Result<T, E> type:

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() -> Result<(), String> {
        if 2 + 2 == 4 {
            Ok(())
        } else {
            Err(String::from("two plus two does not equal four"))
        }
    }
}

However, I find it too cumbersome.

Running tests filtered by name

E.g. run all tests that contains "pattern" in their name:

cargo test pattern

Ignoring specific tests

We can mark tests with #[ignore] to ignore them.

#[test]
#[ignore]
fn test_foo() { ... }

To run exclusively tests that are maked as ignored:

cargo test -- --ignored

Shared behavior for integration tests

There is a special module common (tests/common/mod.rs) where integration should keep their shared behavior. Read more in the Rust book: submodules in integration tests.

Closures

Closures can capture values from their environment in three ways, which directly map to the three ways a function can take a parameter: taking ownership, borrowing mutably, and borrowing immutably. These are encoded in the three Fn traits as follows:

FnOnce trait

FnOnce consumes the variables it captures from its enclosing scope, known as the closure’s environment. To consume the captured variables, the closure must take ownership of these variables and move them into the closure when it is defined. The Once part of the name represents the fact that the closure can’t take ownership of the same variables more than once, so it can be called only once.

FnMut trait

FnMut can change the environment because it mutably borrows values.

Fn trait

Fn borrows values from the environment immutably.

Smart pointers

References VS Pointers

From The Rust Book, chapter 15:

In Rust, which uses the concept of ownership and borrowing, an additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.

Deref coercion

Traits Deref and DerefMut are responsible for dereferencing pointers.

As it was mentioned earlier, deref coercion is implicit and happens on function invocations or variable assignments. What I find interesting, is that Rust may perform multiple deref coercions to get a necessary type.

Consider the following code:


use std::ops::Deref;

struct A(i32);
struct B(A);
struct C(B);

impl Deref for A {
    type Target = i32;

    fn deref(&self) -> &i32 {
        println!("deref A to i32");
        &self.0
    }
}

impl Deref for B {
    type Target = A;

    fn deref(&self) -> &A {
        println!("deref B to A");
        &self.0
    }
}

impl Deref for C {
    type Target = B;

    fn deref(&self) -> &B {
        println!("deref C to B");
        &self.0
    }
}

fn print_number(number: &i32) {
    println!("number = {}", number);
}

fn main() {
    let c: C = C(B(A(13)));
    print_number(&c);
}

i32 is wrapped by type A, which is wrapped by B, which is wrapped by C. All the 3 wrapper types implement Deref.

When we call print_number(&i32) passing &C as an argument rust compiler implicitly calls c.deref().deref().deref(), performing this chain of conversion:

&C -> &B -> &A -> &i32

Eventually, the output of that little program above is:

deref C to B
deref B to A
deref A to i32
number = 13

Rc and reference cycles

Use Rc when it's not possible to determine at compile-time which part of the program will finish using the data last.

Reference Cycles Can Leak Memory:

Rust’s memory safety guarantees make it difficult, but not impossible, to accidentally create a memory that is never cleaned up
Creating reference cycles is not easily done, but it’s not impossible either.

Concurrency

Reading from a receiver with `for in` loop

I typically used receiver.recv() to read message from a receiver. But std::sync::mpsc::Receiver implements IntoIterator, meaning that one can use for in loop what is much more handy:

let (tx, rx) = std::sync::mpsc::channel();
for message in receiver {
    // process message
}

Mutex and interior mutability

I haven't thought of Mutex in terms of interior mutability, but Mutex<T> provides interior mutability, as the Cell family does.

Object safety and traits

Object safety is required for Trait Objects.

You can only make object-safe traits into trait objects. Some complex rules govern all the properties that make a trait object-safe, but in practice, only two rules are relevant. A trait is object-safe if all the methods defined in the trait have the following properties:

The return type isn't Self
There are no generic type parameters.

For example, it's not allowed to have Box<dyn Clone> because Clone::clone() returns Self and therefore is not object-safe.

let clonable: Box<dyn Clone> = Box::new(555i32);

Compilation error:

error[E0038]: the trait `Clone` cannot be made into an object
 --> src/main.rs:4:19
  |
4 |     let clonable: Box<dyn Clone> = Box::new(555i32);
  |                   ^^^^^^^^^^^^^^ `Clone` cannot be made into an object
  |
  = note: the trait cannot be made into an object because it requires `Self: Sized`
  = note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically;

You'll find more about Object Safety in the Rust Reference.

Patterns and matching

Pattern matching in Rust is very powerful, and I have realized that usually, I use only about a half of its capabilities.

Ignoring values in a destruction

.. is used to ignore values we're not interested in:

let (x, y, ..) = (1, 2, 3, 4, 5, 6);
let Person { name, .. } = Person { name: "Peter", age: 24 };

Multiple match patterns, ranges, guards, and bindings

| is used to match multiple alternatives
Ranges can be used as a pattern
if defines an extra match guard
@ is used to bind variable to perform an extra test

Example:

let x = 16;

match x {
    2 | 4                     => println!("2 or 4-6"),
    z @ 10..=20 if z % 3 == 0 => println!("Dividable by 3 within range 10-20"),
    _                         => println!("Something else")
}

Refutability

There 2 kinds of patterns: refutable and irrefutable.

Irrefutable patterns

Patterns that match any possible value passed are irrefutable.

Example:

let x = 7;

There is nothing that can go wrong with that pattern.

Refutable patterns

Patterns that can fail to match for some possible value are refutable.

Example:

if let Some(x) = option

If option was None pattern above would not match.

Function parameters, let statements, and for loops can only accept irrefutable patterns, because the program cannot do anything meaningful when values don't match.

Unsafe

I have to be honest: in 4 years as I use Rust for my side projects, I never felt a need to use unsafe. However, it's good to have a shallow understanding of it.

Unsafe superpowers

Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions

Raw pointers

Unsafe Rust has two new types called raw pointers: *const T, *mut T. Different from references and smart pointers, raw pointers:

Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location
Aren't guaranteed to point to valid memory
Are allowed to be null
Don't implement any automatic cleanup

Const VS immutable static var

Constants and immutable static variables might seem similar, but a subtle difference is that values in a static variable have a fixed address in memory. Using the value will always access the same data. Constants, on the other hand, are allowed to duplicate their data whenever they're used

Unions

Usage of unions requires unsafe. However, the only valid use case of unions is compatibility with C.

Advanced Types

Thunk

Thunk is just a new term that I haven't heard before. From Wikipedia:

In computer programming, a thunk is a subroutine used to inject an additional calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine.

Never type

Rust has never type ! (in some other languages known as empty type). It can be used in functions, that never return a value. E.g. in an endless loop:

fn run() -> ! {
    loop {
        // do something
    }
}

Dynamically Sized Types

Dynamically sized types (DST) are types whose size is known only at runtime and is not known at compile time.

For example, str (not &str) is DST because the size of a string can not be known at compile time.

The Same applies to traits: every particular trait is DST.

We could try to implement function like where argument implements Debug trait:

fn debug(arg: dyn std::fmt::Debug) {
    println!("{:?}", arg);
}

But it will not compile. Rust tells us explicitly that size of Debug is not known at compile time:

1 | fn debug(arg: dyn std::fmt::Debug) {
  |          ^^^ doesn't have a size known at compile-time
  |
  = help: the trait `Sized` is not implemented for `(dyn Debug + 'static)`

And in the error message the compiler mentions Sized trait. Sized trait is used to determine whether or not a particular type's size is known at compile time.

In fact, whenever there is a generic function like this one:

fn generic<T>(t: T) {
    // --snip--
}

Rust sees it as:

fn generic<T: Sized>(t: T) {
    // --snip--
}

If the Sized restriction needs to be relaxed, a developer must explicitly use ?Sized. Let's say we want to have a function generic over T, but as an argument, we are passing reference instead of actual value. Because the size of a reference is always known at compile time, T: Sized restriction is not wanted:

fn generic<T: ?Sized>(t: &T) {
    // --snip--
}

This way generic function can be generic over str.

Advanced Functions and Closures

Fn (trait) and fn (function pointer) are different things. Generally prefer using function interfaces with traits Fn, FnMut, FnOnce instead of fn type, because traits give more flexibility.

Macros

Briefly macros can be divided into the following categories:

declarative (macro_rules!)
procedural
- custom #[derive]
- attribute-like macros, e.g. #[route(GET, "/")]
- function-like macros

Function-like macros was a discovery for me. In terms of usages it's very similar to macro_rules!, but it allows to implement parsers for a completely custom syntax. I think function-like macros must be a very good fit for DSL.

For example, this can be a totally valid rust code:

deutsch!(Was soll das sein?);

Raw identifies

Raw identifiers are the syntax that lets you use keywords where they wouldn’t normally be allowed. You use a raw identifier by prefixing a keyword with r#.

For example, normally it's not possible to define function match() because match is a keyword used for pattern matching. However, with raw identifies one can work around it:

fn r#match(needle: &str, haystack: &str) -> bool {
    haystack.contains(needle)
}

Summary

With this article, I just wanted to polish my Rust knowledge. However, if you have discovered something new, I am glad. The article itself is a derivate of The Rust Book which I encourage you to (re)read.

Back to top

Rust Knowledge Refinement

Copy and Drop relation

Slices

&str slice boundaries

Implicit &String -> &str conversion

Struct update syntax

Iter::collect()

impl Trait syntax

impl Trait as a function argument

impl Trait as a return value

Blanket implementations

Lifetime elision rules

Tests

Passing options to cargo test

Using Result<T, E> in tests

Running tests filtered by name

Ignoring specific tests

Shared behavior for integration tests

Closures

FnOnce trait

FnMut trait

Fn trait

Smart pointers

References VS Pointers

Deref coercion

Rc and reference cycles

Concurrency

Reading from a receiver with for in loop

Mutex and interior mutability

Object safety and traits

Patterns and matching

Ignoring values in a destruction

Multiple match patterns, ranges, guards, and bindings

Refutability

Irrefutable patterns

Refutable patterns

Unsafe

Unsafe superpowers

Raw pointers

Const VS immutable static var

Unions

Advanced Types

Thunk

Never type

Dynamically Sized Types

Advanced Functions and Closures

Macros

Raw identifies

Summary

Reading from a receiver with `for in` loop