But what is life without a little sound and fury? Besides, "mere theory" can loom large for those creating the tools required to analyze concurrent software. This post therefore gives a brief introduction to OOTA, and then provides some simple practical solutions, along with some difficult ones.
Here is the canonical OOTA example, involving two threads each having but one statement:
T1: x.store(y.load(mo_relaxed), mo_relaxed); T2: y.store(x.load(mo_relaxed), mo_relaxed);
Believe it or not, according to the C/C++ memory model, even if both x and y are initially 0, after both threads complete it might be that x==y==42!
LKMM avoids OOTA by respecting dependencies, at least when headed by marked accesses. Plus the dependent access must either be marked or be free of data races. For example:
T1: WRITE_ONCE(x, READ_ONCE(y)); T2: WRITE_ONCE(y, READ_ONCE(x));
If x and y are both initially zero, then they are both guaranteed to stay that way!
There has been great quantities of ink spilled on the OOTA problem, but these three C++ working papers give a reasonable overview and cite a few of the more prominent papers:
- P0422R0: Out-of-Thin-Air Execution is Vacuous. This presents a fixed-point analysis that to the best of my knowledge is the first scheme capable of differentiating OOTA scenarios from straightforward reordering scenarios. Unfortunately, this analysis cannot reasonably be applied to automated tooling. This paper also includes a number of references to other work, which unfortunately either impose unnecessary overhead on weakly ordered systems on the one hand or are well outside of what compiler developers are willing to countenance on the other.
- P1916R0: There might not be an elegant OOTA fix. This diabolically clever paper shows how back-propagation of undefined behavior can further complicate the OOTA question.
- P2055R0: A Relaxed Guide to memory_order_relaxed. This paper presents memory_order_relaxed use cases that are believed to avoid the OOTA problem.
As with address, data, and control dependencies, the trivial solution is for Rust to simply promote all READ_ONCE() operations to smp_load_acquire(), but presumably while leaving the C-language READ_ONCE() untouched. This approach increases overhead, but keeps the people building analysis tools happy, or at least happier. Unfortunately, this trivial solution incurs overhead on weakly ordered systems, overhead that is justified only in theory, never in practice.
But if analysis tools are not an issue, then Rust could simply use volatile loads (perhaps even directly using an appropriately Rust-wrapped instance of the Linux kernel's READ_ONCE() primitive), just as the Linux kernel currently does (but keeping architecture-specific requirements firmly in mind). The problem is after all strictly theoretical.