Why ownership/borrowing in Rust Programming is hard

Why ownership/borrowing in Rust Programming is hard

Working with pure functions is simple: you pass arguments, you get a result — no side effects happen. If, on the other hand, a function does have side effects, like mutating its arguments or global objects, it’s harder to reason about. But we’ve got used to those too: if you see something like player.set_speed(5) you can be reasonably certain that it’s going to mutate the player object in a predictable way (and may be send some signals somewhere, too).

Rust’s ownership/borrowing system is hard because it creates a whole new class of side effects.

Simple example

Consider this code:

let point = Point {x: 0, y: 0};
let result = is_origin(point);
println!("{}: {}", point, result);

Nothing in the experience of most programmers would prepare them to point suddenly stopping working after being passed to is_origin()! The compiler won’t let you use it in the next line. This is the side effect I’m talking about — something has happened to the argument — but not the kind you’ve seen in other languages.

Here it happens because point gets moved (instead of being copied) into the function so the function becomes responsible for destroying it and the compiler prevents you from using it after that point. The way to fix it is to either pass the argument by reference or to teach it how to copy itself. It makes total sense once you’ve learned about “move by default”. But these things tend to jump out on you in a seemingly random fashion while you’re doing some innocent refactorings or, say, adding logging.

Complicated example

Consider a parser that takes some bits of data from an underlying lexer and maintains some state:

struct Parser {
    lexer: Lexer,
    state: State,
}

impl Parser {

    fn consume_lexeme(&mut self) -> Lexeme {
        self.lexer.next()
    }

    pub fn next(&mut self) -> Event {

        let lexeme = self.consume_lexeme(); // read the next lexeme

        if lexeme == SPECIAL_VALUE {
            self.state = State::Closed      // update state of the parser
        }
    }
}

The seemingly unnecessary consume_lexeme() is just a convenience wrapper around a somewhat longer string of calls that I have in the actual code.

The lexer.next() returns a self-sufficient lexeme by copying data from the lexer’s internal buffer. Now, we want to optimize it so lexemes would only hold references into that data and avoid copying. We change the method declaration to:

pub fn next<'a>(&'a mut self) -> Lexeme<'a>

The 'a thingy effectively says that the lifetime of a lexeme is now tied to the lifetime of the lexer reference on which we call .next(). It can’t live all by itself but depends on data in the lexer’s buffer. The 'a just spells it out explicitly here.

And now Parser::next() stops working:

error: cannot assign to `self.state` because it is borrowed [E0506]
       self.state = State::Closed
       ^~~~~~~~~~~~~~~~~~~~~~~~~~

note: borrow of `self.state` occurs here
       let lexeme = self.consume_lexeme();
       ^~~~

In plain English, Rust tells us that as long as we have lexeme available in this block of code it won’t let us change self.state — a different part of the parser. And this does not make any sense whatsoever!

The culprit here is the consume_lexeme() helper. Although it only actually needs self.lexer, to the compiler we say that it takes a reference to the entire parser (self). And because it’s a mutable reference, the compiler won’t let anyone else touch any part of the parser lest they might change the data that lexeme currently depends on.

So here we have this nasty side effect again: though we didn’t change actual types in the function signature and the code is still sound and should work correctly, a different ownership dynamic suddenly doesn’t let it compile anymore.

Even though I understood the problem in general it took me no less than two days until it all finally clicked and the fix became obvious.

Rusty fix

Changing consume_lexeme() to accept a reference to just the lexer instead of the whole parser has fixed the problem but the code looked a bit non-idiomatic, having changed from a dot-method notation into a plain function call:

let lexeme = consume_lexeme(self.lexer); // want self.<something-something> instead

Luckily Rust actually makes it possible to have it the right way, too. Since in Rust the definition of data fields (struct) is separate from the definition of methods (impl) I can define my own local methods for any struct, even if it’s imported from a different namespace:

use lexer::Lexer;

// My Lexer methods. Does *not* affect other uses of Lexer elsewhere.
impl Lexer {
    pub fn consume(&mut self) -> Lexeme { .. }
}

// ...

let lexeme = self.lexer.consume(); // works!

Rust: Principle of Least Privilege?

Rust’s borrow checker is a wonderful thing that forces you into designing code to be more robust. But as it is so unlike anything you’re used to, it takes time to develop a certain knack to work with it efficiently.

I thought that was one of the most fascinating parts – Rust’s borrow-checker enforces the Law of Demeter and Principle of Least Privilege as a side-effect.

Code that takes a full structure when it only needs to operate on a part of the structure is badly designed. It’s not conveying the full information about the data that it actually needs, which means that unexpected dependencies can crop up, implicit in the body of the function, as the code is modified later on. This is behind a lot of long-term maintenance messes; I remember a few open sourced projects at Google to break up “data whales” where a single class had become a dumping ground for all the information needed within a request.

Thing is, we all do it, because taking a reference to a general object and then pulling out the specific parts you need means that you don’t have to change the function signature if the specific parts you need change. This saves a lot of work when you’re iterating quickly and discovering new requirements. You’re trading ease of modification now for difficulty of comprehension later, which is usually the economically wise choice for you but means that the people who come after you will have a mess to untangle.

This makes me think that Rust will be a very poor language for exploratory programming, but a very good one for programming-in-the-large, where you’re building a massive system for requirements that are largely known.

The "Law of Demeter" means something very specific in OO programming, often expressed by the rules:  
  1. You can play with yourself
  2. You can play with your own toys (but you can't take them apart)
  3. You can play with the toys that were given to you.
  4. And you can play with toys you've made yourself.

Put simply, it means that you shouldn’t attempt to destructure or inspect the arguments that were passed to you. If you’re passed a point and need to access point.x and point.y, then you’re a method on the wrong class; you should be a method on Point instead. If you’re passed a file but only need to access file.path, your parameter type is wrong: you should take a filepath instead and let your caller destructure for you. If you need to access foo.bar and foo.baz but foo has 20 data members, you should collect bar and baz on its own sub-structure and pass that in directly, or better yet, make your function a method on the sub structure. If you need to self.mutate_my_foo(self.access_my_bar()), you should call self.foo.mutate(self.access_my_bar()). And so on – the point is for each function to have the minimal knowledge necessary to complete its task, and any decisions unrelated to that task should be propagated up to higher levels of the program.

I won’t deny that this is frustrating. The Law of Demeter has been very controversial in OO circles, because it’s so restrictive that pretty much nobody can actually adhere to it without creating so much work for themselves that their project ships late. In forcing your code to always use the minimal set of data necessary, you force yourself to change the code (including many potentially highly-used APIs) every time you add or remove a data dependency, which is usually impractical. The whole category of dependency injection frameworks was invented to automate much of this plumbing work.

But I find it fascinating that Rust’s borrow-checker has basically forced it down on one side of the tradeoff. It has a bunch of implications for what Rust is good at and what Rust is not good at.