Shared Mutable State in Rust: Is It Really the Root of All Evil?

The author of this article is EPAM Systems Engineer Irine Kokilashvili.

Shared Mutable State in Rust: Is It the Root of All Evil?

Published in Tech matters14 November 20237 min read

Introduction

This article was inspired by Tom Kaitchuck’s YouTube video “Why Rust is a significant development in programming languages.” It is aimed at everyone interested in Rust and software development. I wrote it while learning about the shared mutable state in Rust, and my goal is to share this knowledge.

I extend my heartfelt gratitude to my colleagues Andrey, Ilya, and Anton, who dedicated their time to review and provide valuable feedback for this article.

Garbage collection and memory management

When a computer science problem is successfully resolved, it frequently leads to the emergence of a higher level of abstraction, enabling the development of future programs without a complete understanding of the initial problem. If the problem remains unresolved, it can resurface in a modified manifestation at the next level of abstraction.

Consider the example of memory management and garbage collection. Garbage collection is a process through which the runtime environment automatically identifies and reclaims memory that is no longer needed by the program. This allows us to focus on application logic without having to worry about memory management.

While the introduction of garbage collector alleviated memory-related issues, it introduced its own challenges, such as potential performance issues. It is worth mentioning the hellish troubles that arise when we combine different garbage collector runtimes. A lot of fun is guaranteed, for example, when integrating Java and Go libraries and making them work in harmony within a Python application.

Shared mutable state

A fundamental problem in computer programming is understanding how to manage the shared mutable state. “Shared mutable state” refers to data that can change and that is used in multiple places.

Deep JavaScript defines a shared mutable state as a condition where two or more entities have the capability to modify the same data. If their lifetimes overlap, there is a risk that changes made by one entity could affect the functionality of the other.

Different programming languages have different approaches to solving this problem.

Functional programming languages

Functional programming languages attempt to solve this problem by disallowing mutability, which means that data structures and variables cannot be changed once they are created. This helps prevent issues related to managing the shared mutable state, since different parts of a program cannot modify the same data at the same time.

Functional programming languages also help prevent bugs and make the behavior of a program more predictable. This is because the state of the program is not changing constantly due to multiple threads or processes accessing shared mutable data.

Take a look at this Haskell example, from the Monday Morning Haskell blog post “Immutability is Awesome”:

In this piece of code, we are calling a reverse function of an array of number a. When the function is called, the value of a does not change. Calling the reverse function produces output that is expected, but the original variable is not impacted.

To be clear, Haskell does give you tools to introduce limited mutability, even shared, with StRef, IORef, MVar, TVar, etc., but by design it is a functional language. Haskell's disallowance of shared mutable state eliminates many common issues associated with mutable state, but introduces its own challenges.

Object-oriented languages

Unlike functional programming languages, object-oriented languages have tried to address the shared mutable state issue by encapsulating the state in classes to limit sharing. Java is an object-oriented programming language that tries to address shared mutability this way.

Encapsulation refers to the practice of hiding the internal state of an object and permitting only certain methods to interact with it. This helps limit the sharing of mutable state between different parts of a program and can reduce the risk of conflicts and synchronization issues.

Here is an example in Java, partly generated with help of ChatGPT, that demonstrates encapsulation and the limitation of sharing mutable state:

Example in Java that demonstrates encapsulation and the limitation of sharing mutable state

In this example, the Counter class encapsulates the count state and exposes only methods to increment, decrement, and getCount. The methods are synchronized, which means that only one thread can execute them at a time. This helps prevent synchronization issues when multiple threads try to modify the count simultaneously.

The Main class creates an instance of Counter and calls the increment and decrement methods to modify the count. The getCount method is called to retrieve the current count.

Again, encapsulating the state of the Counter in the class limits the sharing of mutable state solely to the methods defined in the class. Using synchronized methods also helps prevent synchronization issues when multiple threads modify the count at the same time.

The purpose of encapsulation and synchronization in this case is to control and manage access to shared mutable state to ensure that modifications are done safely and consistently.

Even with Java's synchronization mechanisms, concurrency bugs like data races (discussed below) or deadlocks can still occur if safeguards are not used correctly, this is why we should consider careful coding and design practices.

It is important to note that data races are not as severe in Java as in, say, C++, because Java has a strong memory model, meaning that any given thread will never see a torn value. Data races are certainly undesirable in either language, but in Java they have a defined semantic while in C++ they explicitly contribute to undefined behavior.

“Shared mutable state is the root of all evil”

While the above approaches address many of the challenges related to shared mutable state, they may not completely eliminate all underlying issues. As a result, issues can appear in various unexpected ways, such as data races, performance issues, or the need for garbage collection.

1. Data races: Data races are a subset of race conditions. When multiple threads access and modify the same shared mutable state concurrently, it can lead to race conditions, where the final outcome depends on the timing and order of the thread execution. This can cause unexpected behavior, such as incorrect results, data corruption, or crashes.

2. Memory leaks: When shared mutable state is not managed properly, it can lead to memory leaks, where objects are not released when they are no longer needed, leading to wasted memory and potential performance issues.

3. Garbage collection: In languages without manual memory management, garbage collection is necessary to reclaim memory from objects that are no longer referenced by any part of the program. Although necessary, garbage collection can cause hard-to-fix performance issues because shared mutable state can make it difficult to track which objects are still in use, and which can be safely deleted. The creation of many new copies of data structures for updates can also lead to increased pressure.

Shared mutable state in Rust

Although Rust is not the first language to address the issue of shared mutable state, it stands out among languages such as Cyclone and Ada that made similar attempts. Rust's distinctive combination of features, including ownership and borrowing semantics, has boosted widespread adoption of the language in the programming community.

On the ycombinator discussion thread, Sanghyeon Seo explains shared mutable state and Rust:

“Shared mutable state is evil. So, functional programming languages thought: let's have no mutable state. So, Rust thought: let's have no sharing of state. Rust is not a functional programming language. It is a different approach to the same problem.”

Rust's ownership system enforces strict rules about how objects can be accessed and modified, ensuring that there are no data races or other concurrency issues. Additionally, Rust's ownership system ensures efficient memory management without the need for garbage collection. Static analysis and runtime checks are used to ensure that memory is allocated and freed correctly.

Rust has rules that specify that variables are deallocated at the end of every function, preventing them from being used by the caller more than once. Rust uses its borrow checker to create a temporary reference variable to the original value, so that the underlying value is not dropped while the reference is still in use.

“To drop" means to deallocate or release the resources associated with a value. This is a fundamental concept in Rust's memory management, since it ensures that resources are properly released when they are no longer needed.

It should be mentioned that it is possible to achieve a memory leak even in safe Rust code with functions like ‘mem::forget’, ‘Box::leak’, or an unfortunately designed reference cycle between ‘Rc’ smart pointers.

We distinguish between “safe” and “unsafe” code. Safe Rust is designed to be memory-safe, whereas unsafe Rust allows developers to bypass some of the language's safety checks when necessary. A memory leak is not considered unsafe in Rust because it does not lead to undefined behavior.

Andrew Pritchard’s example in his blog post “Shared mutability in rust.” demonstrates how mutability works in Rust:

We can clearly see the concept of borrowing and lifetimes in this code, which ensures safe and controlled access to data. The code defines two structures, P2 and View, and shows how the View structure is used to create a controlled view into the data of a P2 instance.

The lifetime annotations ensure that the references remain valid and within the intended scope. The View instances do not, however, automatically update when the underlying data in P2 changes.

Overall, with Rust, a shared mutable state does not have to be the root of all evil.

Rust disadvantages

Even though Rust excels in many areas, it has some disadvantages.

One complexity is associated with asynchronous programming support, which is quite painless in languages with garbage collection. While Rust has asynchronous programming capabilities, the complexity arises from the ownership and borrowing model, which makes handling async operation much more difficult than in other languages.

Rust also has powerful features to deal with shared memory, but when Rust programs interact with external resources or external systems, we need to handle these aspects independently.

In his blog post, When Rust hurts, Roman Kashitsyn writes: “Remember that paths are raw pointers, even in Rust. Most file operations are inherently unsafe and can lead to data races (in a broad sense) if you do not correctly synchronize file access. For example, as of February 2023, I still experience a six-year-old concurrency bug in rustup.”

Conclusion

Rust's commitment to safety and control makes it a valuable programming language choice for systems programming and applications where memory safety is vital. However, developers must be prepared to navigate the nuances of Rust, such as the ownership and borrowing model, asynchronous code, and unsafe code.

The views expressed in the articles on this site are solely those of the authors and do not necessarily reflect the opinions or views of Anywhere Club or its members.