Interestingly enough, DYNIX and early versions of DYNIX/ptx used compile-time-allocated arrays for almost all of its data structures. You want your kernel to support up to N tasks? Very well, build your kernel to have its array of N task structures. This worked surprisingly well, perhaps because the important concurrent applications of that time had very predictable resource requirements, including numbers of tasks. Nevertheless, as you might expect, this did become quite the configuration nightmare. So why were arrays used in the first place?
To the best of my knowledge, the earliest published complete articulation of the reason appeared in Gamsa et al.'s landmark paper "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System". The key point is that you cannot protect a dynamically allocated object with a lock located within that object. The DYNIX arrays avoided deallocation (or, alternatively, provided a straightforward implementation of type-safe memory), thus allowing these objects to be protected with internal locks. Avoiding the need for global locks or reference counters was an important key to the performance and scalability prized by Sequent's customers.
This strategy worked less well when Sequent added a distributed lock manager because the required number of locks was not predictable, nor was there a useful upper bound. This problem was solved in part by the addition of RCU, which provided a high-performance and scalable means of resolving races between acquiring a given object's lock and deletion (and subsequent freeing) of that same object. Given that DEC Alpha famously had difficulty with RCU, it is only reasonable to ask how Rust will do with it. Or must the concurrency designs of those portions of the Linux kernel that are to be written in Rust be "fitted" to a Rust-language ProcRustean bed ? Those who prize Rust's fearless-concurrency goal above all else might reasonably argue that this ProcRustean bed is in fact a most excellent thing. However, some Linux-kernel maintainers (including this one) might in their turn reasonably argue that within the context of some portions of the Linux kernel, a proper level of fear is a very healthy thing. As is the ability to do one's job in a reasonably straightforward manner!
One way to avoid this ProcRustean bed is to use the Rust unsafe facility, and in fact "unsafe" has been the answer to a disturbingly large number of my questions about Rust . However, use of this facility introduces the possibility of data races, which in turn raises the question of Rust's memory model. Within the Linux kernel, the answer to this question is of course LKMM, or perhaps some reasonable subset of LKMM.
However, in my personal experience, I have most frequently seen Rust being used to rewrite scripts that became performance problems upon being more widely deployed than expected. In some cases, these rewrites greatly improved user experience as well as performance. This means that Rust is heavily used outside of the Linux kernel, which in turn means that LKMM might not be the right answer for Rust in general, though some in the Rust community have come out strongly in favor of extending the ProcRustian bed. But this blog series is focused on Rust for the Linux kernel, so the question of the memory model for Rust in general is out of scope. Again, for Rust in the Linux kernel, some subset of LKMM is clearly the correct memory model.
Many of the following posts in this series cover ways that Rust might work with specific aspects of LKMM, including some wild speculation about how Rust's ownership model might be generalized in a manner similar to Linux-kernel's lockdep checking has been generalized for cross-released locks and for RCU. Readers wishing to learn more about non-ProcRustean concurrency designs are invited to peruse "Is Parallel Programming Hard, And, If So, What Can You Do About It?, hereinafter called "perfbook". Specific chapters and sections of the Second Edition of this book will be cited as appropriate by later posts in this series.
|||Making this 1990s-style concurrency scale usually involves hashed arrays of locks. These are often deadlock-prone, but there are heavily used techniques that (mostly) avoid the deadlocks. See Section 126.96.36.199 ("Acquire Needed Locks First") in perfbook for one such technique. However, hashed arrays of locks are prone to scalability problems, especially on multi-socket systems, due to poor locality of reference. See for example Section 10.2.3 ("Hash-Table Performance") for performance results on hash tables, also in perfbook. As a result, most attempts to apply hashed arrays of locks to the Linux kernel resulted in RCU being used instead. The performance and scalability benefits of RCU (and hazard pointers) are shown in Section 10.3 ("Read-Mostly Data Structures"), again in perfbook.|
|||But please note that Rust's unsafe code has only limited undefined-behavior unsafe superpowers:|
Besides, more recent discussions have focused on memory safety rather than the full gamut of undefined behavior.
HistoryOctober 12, 2021: Self-review. Note that some of the comments are specific to earlier versions of this blog post.
October 13, 2021: Add note on memory safety specifically rather than undefined behavior in general.