Support Ukraine. DONATE.
A blog about software development.

Rust Knowledge Refinement

Serhii Potapov February 07, 2021 #rust

Recently I reread The Rust Book. The last time I read the book (more precisely the first edition of it) was in 2016. Since that some things got forgotten, others did not exist yet. So to address my knowledge gaps I decided to document some discoveries. This blog post is mostly for me myself, so I can come back here and quickly refresh my memory. But you may discover something new and interesting as well.

Copy and Drop relation

Types can exclusively implement either Copy or Drop traits. But not both.

If I think about it for a while it does make sense: if a type requires a special Drop trait to correctly cleanup resources, a simple memory copy(Copy trait) would lead to a state, where a resource may have multiple owners. (but we know that the ownership rules allow only one owner).

Slices

Type definitions:

A slice occupies memory on the stack that equals double usize (16 bytes on x86-64). One usize is to store a memory pointer, another one is for length.

&str slice boundaries

String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.

For example, this code will execute and print russian "П" letter:

let hi = "Привет Мир";
let slice = &hi[0..2]; // first 2 bytes
println!("slice = {}", slice);

But if we replace the second line with let slice = &hi[0..3]; it panics:

thread 'main' panicked at 'byte index 3 is not a char boundary;
it is inside 'р' (bytes 2..4) of `Привет Мир`', src/main.rs:3:1

Implicit &String -> &str conversion

Rust is typically very strict about types, however, this code is valid:

let hi: String = String::from("hi");
let slice: &str = &hi;

&hi with type &String gets assigned to a variable with type &str and that compiles. This is due to an exception called Deref coercion.

Struct update syntax

Rust supports the struct update syntax, similar to the one that JavaScript has. I don't happen to use it often, but it could be very helpful in tests.

let espresso = Product { name: "Espresso".to_string(), price: 2.50 };
let double_espresso = Product { price: 3.50, ..espresso };

Iter::collect()

I always used Iter::collect() method to build a vector, but actually collect() can be used to build many other collections, that implement FromIterator trait. For example that following code produces a hash map of intergers from 1 to 5 and their squares:

let squares: HashMap<i32, i32> = (1..=5).map(|x| (x, x*x)).collect();
println!("{:?}", squares);  // {3: 9, 4: 16, 1: 1, 5: 25, 2: 4}

impl Trait syntax

impl Trait as a function argument

I would typically define a generic function like this:

fn send<T: Message>(message: &T) { ... }

But Rust also has impl Trait syntax:

fn send(message: &impl Message) { ... }

However, I would still prefer the first option, because it better communicates visually the fact that the function is generic.

impl Trait as a return value

impl Trait can also be used to return values when a returned type is too long to write it manually. It's typically used to return futures or iterators.

In the following example function filter_div3() takes an iterator that produces i32 applies an extra filter on it and returns back a new iterator.

fn filter_div3(iter: impl Iterator<Item=i32>) -> impl Iterator<Item=i32> {
    iter.filter(|x| x % 3 == 0)
}

fn main() {
    // Fib is an iterator that produces the fibonacci sequence
    // [3, 21, 144, 987, 6765]
    println!("{:?}", filter_div3(Fib::new()).take(5).collect::<Vec<i32>>());

    // [3, 6, 9, 12, 15]
    println!("{:?}", filter_div3(1..=100).take(5).collect::<Vec<i32>>());
}

This is handy but comes also with restrictions. We're not allowed to use the returned value as anything else, but an iterator.

E.g. this code does not compile:

println!("{:?}", filter_div3(1..=100));

because the compiler has no guarantees, that value returned by filter_div3() implements Debug trait.

UPDATE (based on the reddit comment):

It's is possible use Debug if you change the return type from impl Iterator<Item=i32> to impl Iterator<Item=i32> + Debug.

Blanket implementations

The concept was known to me, but I was not aware of the term itself.

From The Rust Book:

Implementations of a trait on any type that satisfies the trait bounds are called blanket implementations.

Examples:

Lifetime elision rules

The first rule is:

Each parameter that is a reference gets its lifetime parameter.

// E.g. the following function signature
fn foo(x: &u8, &u8)

// Is seen by Rust as
fn foo<'a, 'b>(x: &'a u8, &'b u8)

The second rule is:

If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters.

fn foo(x: &u8) -> &u8
// Equals to
fn foo<'a>(x: &'a u8) -> &'a u8

The third rule is:

If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, the lifetime of self is assigned to all output lifetime parameters.

fn foo(&self, x: &u8) -> &u8
// Equals to
fn foo<'a, 'b>(&'a self, x: &'b u8) -> &'a u8

Tests

Passing options to cargo test

We can pass options directly to cargo test, for example:

cargo test --help

Or we can pass options to a binary that cargo test runs:

cargo test -- --help

I often use

cargo test -- --nocapture

to see an output printed in my tests.

Using Result<T, E> in tests

Test functions can be defined to return Result<T, E> type:

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() -> Result<(), String> {
        if 2 + 2 == 4 {
            Ok(())
        } else {
            Err(String::from("two plus two does not equal four"))
        }
    }
}

However, I find it too cumbersome.

Running tests filtered by name

E.g. run all tests that contains "pattern" in their name:

cargo test pattern

Ignoring specific tests

We can mark tests with #[ignore] to ignore them.

#[test]
#[ignore]
fn test_foo() { ... }

To run exclusively tests that are maked as ignored:

cargo test -- --ignored

Shared behavior for integration tests

There is a special module common (tests/common/mod.rs) where integration should keep their shared behavior. Read more in the Rust book: submodules in integration tests.

Closures

Closures can capture values from their environment in three ways, which directly map to the three ways a function can take a parameter: taking ownership, borrowing mutably, and borrowing immutably. These are encoded in the three Fn traits as follows:

FnOnce trait

FnOnce consumes the variables it captures from its enclosing scope, known as the closure’s environment. To consume the captured variables, the closure must take ownership of these variables and move them into the closure when it is defined. The Once part of the name represents the fact that the closure can’t take ownership of the same variables more than once, so it can be called only once.

FnMut trait

FnMut can change the environment because it mutably borrows values.

Fn trait

Fn borrows values from the environment immutably.

Smart pointers

References VS Pointers

From The Rust Book, chapter 15:

In Rust, which uses the concept of ownership and borrowing, an additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.

Deref coercion

Traits Deref and DerefMut are responsible for dereferencing pointers.

As it was mentioned earlier, deref coercion is implicit and happens on function invocations or variable assignments. What I find interesting, is that Rust may perform multiple deref coercions to get a necessary type.

Consider the following code:


use std::ops::Deref;

struct A(i32);
struct B(A);
struct C(B);

impl Deref for A {
    type Target = i32;

    fn deref(&self) -> &i32 {
        println!("deref A to i32");
        &self.0
    }
}

impl Deref for B {
    type Target = A;

    fn deref(&self) -> &A {
        println!("deref B to A");
        &self.0
    }
}

impl Deref for C {
    type Target = B;

    fn deref(&self) -> &B {
        println!("deref C to B");
        &self.0
    }
}

fn print_number(number: &i32) {
    println!("number = {}", number);
}

fn main() {
    let c: C = C(B(A(13)));
    print_number(&c);
}

i32 is wrapped by type A, which is wrapped by B, which is wrapped by C. All the 3 wrapper types implement Deref.

When we call print_number(&i32) passing &C as an argument rust compiler implicitly calls c.deref().deref().deref(), performing this chain of conversion:

&C -> &B -> &A -> &i32

Eventually, the output of that little program above is:

deref C to B
deref B to A
deref A to i32
number = 13

Read more about implicit Deref coercions in The Rust Book.

Rc and reference cycles

Use Rc when it's not possible to determine at compile-time which part of the program will finish using the data last.

Reference Cycles Can Leak Memory:

Concurrency

Reading from a receiver with for in loop

I typically used receiver.recv() to read message from a receiver. But std::sync::mpsc::Receiver implements IntoIterator, meaning that one can use for in loop what is much more handy:

let (tx, rx) = std::sync::mpsc::channel();
for message in receiver {
    // process message
}

Mutex and interior mutability

I haven't thought of Mutex in terms of interior mutability, but Mutex<T> provides interior mutability, as the Cell family does.

Object safety and traits

Object safety is required for Trait Objects.

You can only make object-safe traits into trait objects. Some complex rules govern all the properties that make a trait object-safe, but in practice, only two rules are relevant. A trait is object-safe if all the methods defined in the trait have the following properties:

For example, it's not allowed to have Box<dyn Clone> because Clone::clone() returns Self and therefore is not object-safe.

let clonable: Box<dyn Clone> = Box::new(555i32);

Compilation error:

error[E0038]: the trait `Clone` cannot be made into an object
 --> src/main.rs:4:19
  |
4 |     let clonable: Box<dyn Clone> = Box::new(555i32);
  |                   ^^^^^^^^^^^^^^ `Clone` cannot be made into an object
  |
  = note: the trait cannot be made into an object because it requires `Self: Sized`
  = note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically;

You'll find more about Object Safety in the Rust Reference.

Patterns and matching

Pattern matching in Rust is very powerful, and I have realized that usually, I use only about a half of its capabilities.

Ignoring values in a destruction

.. is used to ignore values we're not interested in:

let (x, y, ..) = (1, 2, 3, 4, 5, 6);
let Person { name, .. } = Person { name: "Peter", age: 24 };

Multiple match patterns, ranges, guards, and bindings

Example:

let x = 16;

match x {
    2 | 4                     => println!("2 or 4-6"),
    z @ 10..=20 if z % 3 == 0 => println!("Dividable by 3 within range 10-20"),
    _                         => println!("Something else")
}

Refutability

There 2 kinds of patterns: refutable and irrefutable.

Irrefutable patterns

Patterns that match any possible value passed are irrefutable.

Example:

let x = 7;

There is nothing that can go wrong with that pattern.

Refutable patterns

Patterns that can fail to match for some possible value are refutable.

Example:

if let Some(x) = option

If option was None pattern above would not match.

Function parameters, let statements, and for loops can only accept irrefutable patterns, because the program cannot do anything meaningful when values don't match.

Unsafe

I have to be honest: in 4 years as I use Rust for my side projects, I never felt a need to use unsafe. However, it's good to have a shallow understanding of it.

Unsafe superpowers

Raw pointers

Unsafe Rust has two new types called raw pointers: *const T, *mut T. Different from references and smart pointers, raw pointers:

Const VS immutable static var

Constants and immutable static variables might seem similar, but a subtle difference is that values in a static variable have a fixed address in memory. Using the value will always access the same data. Constants, on the other hand, are allowed to duplicate their data whenever they're used

Unions

Usage of unions requires unsafe. However, the only valid use case of unions is compatibility with C.

Advanced Types

Thunk

Thunk is just a new term that I haven't heard before. From Wikipedia:

In computer programming, a thunk is a subroutine used to inject an additional calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine.

Never type

Rust has never type ! (in some other languages known as empty type). It can be used in functions, that never return a value. E.g. in an endless loop:

fn run() -> ! {
    loop {
        // do something
    }
}

Dynamically Sized Types

Dynamically sized types (DST) are types whose size is known only at runtime and is not known at compile time.

For example, str (not &str) is DST because the size of a string can not be known at compile time.

The Same applies to traits: every particular trait is DST.

We could try to implement function like where argument implements Debug trait:

fn debug(arg: dyn std::fmt::Debug) {
    println!("{:?}", arg);
}

But it will not compile. Rust tells us explicitly that size of Debug is not known at compile time:

1 | fn debug(arg: dyn std::fmt::Debug) {
  |          ^^^ doesn't have a size known at compile-time
  |
  = help: the trait `Sized` is not implemented for `(dyn Debug + 'static)`

And in the error message the compiler mentions Sized trait. Sized trait is used to determine whether or not a particular type's size is known at compile time.

In fact, whenever there is a generic function like this one:

fn generic<T>(t: T) {
    // --snip--
}

Rust sees it as:

fn generic<T: Sized>(t: T) {
    // --snip--
}

If the Sized restriction needs to be relaxed, a developer must explicitly use ?Sized. Let's say we want to have a function generic over T, but as an argument, we are passing reference instead of actual value. Because the size of a reference is always known at compile time, T: Sized restriction is not wanted:

fn generic<T: ?Sized>(t: &T) {
    // --snip--
}

This way generic function can be generic over str.

Advanced Functions and Closures

Fn (trait) and fn (function pointer) are different things. Generally prefer using function interfaces with traits Fn, FnMut, FnOnce instead of fn type, because traits give more flexibility.

Macros

Briefly macros can be divided into the following categories:

Function-like macros was a discovery for me. In terms of usages it's very similar to macro_rules!, but it allows to implement parsers for a completely custom syntax. I think function-like macros must be a very good fit for DSL.

For example, this can be a totally valid rust code:

deutsch!(Was soll das sein?);

Raw identifies

Raw identifiers are the syntax that lets you use keywords where they wouldn’t normally be allowed. You use a raw identifier by prefixing a keyword with r#.

For example, normally it's not possible to define function match() because match is a keyword used for pattern matching. However, with raw identifies one can work around it:

fn r#match(needle: &str, haystack: &str) -> bool {
    haystack.contains(needle)
}

Summary

With this article, I just wanted to polish my Rust knowledge. However, if you have discovered something new, I am glad. The article itself is a derivate of The Rust Book which I encourage you to (re)read.

Back to top