?

Log in

No account? Create an account

Parallel Programming: November 2017 Update

This USA Thanksgiving holiday weekend features a new release of Is Parallel Programming Hard, And, If So, What Can You Do About It?.

This update includes more formatting and build-system improvements, bibliography updates, and better handling of listings, all courtesy of Akira Yokosawa; numerous fixes and updates from Junchang Wang, Pierre Kuo, SeongJae Park, and Yubin Ruan; a new futures section on quantum computing; updates to the formal-verification section based on recent collaborations; and a full rewrite of the memory-barriers section, which is now its own chapter. This rewrite was of course based on recent work with my partners in memory-ordering crime, Jade Alglave, Luc Maranget, Andrea Parri, and Alan Stern.

As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.
I had the privilege of attending CPPCON last month. Michael Wong, Maged Michael, and I presented a parallel-programming overview, in which I presented the "Hardware and its Habits" chapter of Is Parallel Programming Hard, And, If So, What Can You Do About It?.

But the highlight for me was actually sitting in the audience for a pair of talks by people who had implemented RCU in C++.

Ansel Sermersheim presented a two-part talk entitled Multithreading is the answer. What is the question?. The second part of this talk covered lockless containers, and used a variant of RCU to implement a low-overhead libGuarded facility in order to more easily avoid deadlocks. The implementation is similar to the Linux-kernel real-time RCU implementation by Jim Houston and Joe Korty in that the counterpart to rcu_read_unlock() actively registers a quiescent state. Ansel's implementation goes further by also driving callback invocation from rcu_read_unlock(). Now I don't recommend this for a general-purpose RCU implementation due to the possibility of deadlock should a resource need to be held across rcu_read_unlock() and acquired within the callback. However, this approach should work just fine in the case where the callbacks just free memory and the memory allocator does not contain too many RCU read-side critical sections.

Fedor Pikus presented a talk entitled Read, Copy, Update, then what? RCU for non-kernel programmers, in which he gave a quite-decent introduction to use of RCU. This introduction included an improved version of my long-standing where-to-use-RCU diagram, which I fully intend to incorporate. I had a number of but-you-could moments, including the usual "put the size in with the array" advice, ways of updating things already exposed to readers, and the fact that RCU really can tolerate multiple writers, along with some concerns about counter overflow. Nevertheless, an impressive amount of great information in a one-hour talk!

It is very good to see more people making use of RCU!
This is the fourth and final book in Nassim Taleb's Incerto series, which makes a case for antifragility as a key component of design, taking the art of design one step beyond robustness. An antifragile system is one where variation, chaos, stress, and errors improve the results. For example, within limits, stressing muscles and bones makes them stronger. In contrast, stressing a device made of (say) aluminum will eventually cause it to fail. Taleb gives a lengthy list of examples in Table 1 starting on page 23, some of which seem more plausible than others. An example implausible entry lists rule-based systems as fragile, principles-based systems as robust, and virtue-based systems as antifragile. Although I can imagine a viewpoint where this makes sense, any expectation that a significantly large swath of present-day society will agree on a set of principles (never mind virtues!) seems insanely optimistic. The table nevertheless provides much good food for thought.

Taleb states that he has constructed antifragile financial strategies using insurance to control downside risks. But he also states on page 6 “Thou shalt not have antifragility at the expense of the fragility of others.” Perhaps Taleb figures that few will shed tears for any difficulties that insurance companies might get into, perhaps he is taking out policies that are too small to have material effect on the insurance company in question, or perhaps his policies are counter to the insurance company's main business, so that payouts to Taleb are anticorrelated with payouts to the company's other customers. One presumes that he has thought this through carefully, because a bankrupt insurance company might not be all that effective at controlling his downside risks.

Appendix I beginning on page 435 gives a graphical summary of the books main messages. Figure 28 on page 441 is good grist for the mills of those who would like humanity to become an intergalactic species: After all, confining the human race seems likely to limit its upside. (One counterargument would posit that a finite object might have unbounded value, but such counterarguments typically rely on there being a very large number of human beings interested in that finite object, which some would consider to counter this counterargument.)

The right-hand portion of Figure 30 on page 442 illustrates what the author calls local antifragility and global fragility. To see this, imagine that the x-axis represents variation from nominal conditions, and the y-axis represents payoff, with large positive payoffs being highly desired. The right-hand portion shows something not unrelated to the function x^2-x^4, which gives higher payoffs as you move in either direction from x=0, peaking when x reaches one divided by the square root of two (either positive or negative), dropping back to zero when x reaches +1 or -1, and dropping like a rock as one ventures further away from x=0. The author states that this local antifragility and global fragility is the most dangerous of all, but given that he repeatedly stresses that antifragile systems are antifragile only up to a point, this dangerous situation would seem to be the common case. Those of us who believe that life is inherently dangerous should have no problem with this apparent contradiction.

But what does all of this have to do with parallel programming???

Well, how about “Is RCU antifragile?”

One case for RCU antifragility is the batching optimizations that allow many (as in thousands) concurrent requests to share the same grace-period computation. Therefore, the heavier the update-side load on RCU, the more efficiently RCU operates.

However, load is but one of many aspects of RCU's environment that might be varied. For an extreme example, RCU is exceedingly fragile with respect to small perturbations of the program counter, as Peter Sewell so ably demonstrated, by running emacs, no less. RCU is also fragile with respect to timekeeping anomalies, for example, it can emit false-positive RCU CPU stall warnings if different CPUs have tens-of-seconds disagreements as to the current time. However, the aforementioned bones and muscles are similarly fragile with respect to any number of chemical substances (AKA “poisons”), to say nothing of well-known natural phenomena such as lightning bolts and landslides.

Even when excluding hardware misbehavior such as auto-perturbing program counters and unsynchronized clocks, RCU would still be subject to software aging, and RCU has in fact require multiple interventions from its developers and maintainer in order to keep up with changing hardware, workload, and usage. One could therefore argue that RCU is fragile with respect to perturbations of time, although the combination of RCU and its developers, reviewers, and maintainer seem to have kept up reasonably well thus far.

On the other hand, perhaps it is unrealistic to evaluate the antifragility of software without including black-hat hackers. Achieving antifragility in that sort of environment is still very much a grand challenge problem, but a challenge that must be faced. Oh, you think RCU is to low-level for this sort of attack? There was a time when I thought so. And then came rowhammer.

So please be careful, and, where possible, antifragile! It is after all a real world out there!!!
We have been making good progress on the next release of Is Parallel Programming Hard, And, If So, What Can You Do About It?, and hope to have a new release out soonish.

In the meantime, for those of you for whom the English text in this book has simply gotten in the way, there is now an alternative:

perfbook_cn_cover

On the off-chance that any of you are seriously interested, this is available from
Amazon China, JD.com, Taobao.com, and Dangdang.com. For the rest of you, you have at least seen the picture.  ;–)
The last month or two has seen a lot of work simplifying the Linux-kernel RCU implementation, with more than 2700 net lines of code removed. The remainder of this post lists the user-visible changes, along with alternative ways to get the corresponding job done.

  1. The infamous CONFIG_RCU_KTHREAD_PRIO Kconfig parameter is now defunct, but the rcutree.kthread_prio kernel boot parameter gets the job done.
  2. The CONFIG_NO_HZ_FULL_SYSIDLE Kconfig parameter has kicked the bucket. There is no replacement because no one was using it. If you need it, revert the -rcu commit tagged by sysidle.2017.05.11a.
  3. The CONFIG_PROVE_RCU_REPEATEDLY Kconfig parameter is no more. There is no replacement because as far as I know, no one has used it for many years. It was a great help in tracking down lockdep-RCU warnings back in the day, but these warnings are now sufficiently rare that finding them one boot at a time is no longer a problem. If you need it, do the obvious hacking on Kconfig and lockdep.c.
  4. The CONFIG_SPARSE_RCU_POINTER Kconfig parameter now rests in peace. There is no replacement because there doesn't seem to be any reason for RCU's sparse checking to be the only such checking that is optional. If you really need to disable RCU's sparse checking, hand-edit the definition as needed.
  5. The CONFIG_CLASSIC_SRCU Kconfig parameter bought the farm. This was only present to handle massive failures of the new Tree/Tiny SRCU implementations, but these appear to be quite reliable and should be used instead of Classic SRCU.
  6. RCU's debugfs tracing is done for. As far as I know, I was the only real user, and I haven't used it in years. If you need it, revert the -rcu commit tagged by debugfs.2017.05.15a.
  7. The CONFIG_RCU_NOCB_CPU_NONE, CONFIG_RCU_NOCB_CPU_ZERO, and CONFIG_RCU_NOCB_CPU_ALL Kconfig parameters have departed. Use the rcu_nocbs kernel boot parameter instead, which can do quite a bit more than those Kconfig parameters ever could.
  8. Tiny RCU's event tracing and RCU CPU stall warnings are now pushing up daisies. The point of Tiny RCU is to be tiny and educational, and these added features were not helping reach either of these two goals. The replacement is to reproduce the problem with Tree RCU.
  9. These changes should matter only to people running rcutorture:

    1. The CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT and CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT_DELAY Kconfig parameters have been entombed: Use the rcutree.gp_preinit_delay kernel boot parameter instead.
    2. The CONFIG_RCU_TORTURE_TEST_SLOW_INIT and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY Kconfig parameters have given up the ghost: Use the rcutree.gp_init_delay kernel boot parameter instead.
    3. The CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP and CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig parameters have passed on: Use the rcutree.gp_cleanup_delay kernel boot parameter instead.
There will probably be a few more simplifications in the near future, but this should be at least enough for one merge window!
With the Linux-kernel v4.13 merge window coming up, it is time to do at least a little heavy-duty testing of the patches destined for v4.14, which had been but lightly tested on my laptop. An overnight run on a larger test machine looked very good—with the exception of scenario TREE01 (defined by tools/testing/selftests/rcutorture/configs/rcu/TREE01{.boot,} in the Linux-kernel source tree), which got no fewer than 190 failures in a half-hour run. In other words, rcutorture saw 190 too-short grace periods in 30 minutes, for about one every 20 seconds.

This is not just bad. This is RCU completely and utterly failing to be RCU.

My first action was to re-run the tests on the commits slated for v4.13. You can imagine my relief to see them pass on all scenarios, including TREE01.

Then it was time for bisection. I have been burned many times by false bisections due to RCU's probabilistic failure modes, so I ran 24 30-minute tests on each commit. Fortunately, I could run six in parallel, so that each commit only consumed about two hours of test time. The bisection converged on a commit that adds a --kconfig argument to the rcutorture scripts, which allow me to do things like force lockdep to run in all scenarios. However, this commit should have absolutely no effect on the inner workings of RCU.

OK, perhaps this commit managed to fatally mess up the .config file. But no, the .config files from this commit compare equal to those from the preceding commit. Some additional poking gives me confidence that the kernels being built are also identical. Still, the one fails and the other does not.

The next step is to look very carefully at the console output from the failing runs, most of which contain many complaints about RCU grace periods being too short. Except that one of them also contains RCU CPU stall warnings. In fact, one of the stall warnings lists no fewer than 26 CPUs as stalling the current RCU grace period.

This came as a bit of a surprise, partly because I don't ever recall ever seeing that many CPUs stalling a single grace period, but mostly because the test was only supposed to use eight CPUs.

A look at the beginning of the console output showed that RCU was inexplicably prepared to deal with 43 CPUs instead of the expected eight. A bit more digging showed that the qemu command used to run the failing test had “-smp 43”, while the qemu command for the successful test instead had “-smp 8”. In both cases, the qemu command also included the kernel boot parameter “maxcpus=8”. And a very stupid bug in the --kconfig change to the scripts turned out to be responsible for the bogus -smp argument.

The next step is to swap the values of qemu's -smp argument. And the failure follows the “-smp 43” setting. This means that it is possible that the RCU failures are due to a latent timing bug in RCU. After all, the test system has only 64 CPUs, and I was running 43*6=258 CPUs worth of tests on it. But running six concurrent rcutorture tests with both -smp and maxcpus set to 43 passes with flying colors. So RCU must be suffering from some other problem.

The next question is exactly what is supposed to happen when qemu and the kernel have very different ideas of how many CPUs there are. The ever-helpful Documentation/admin-guide/kernel-parameters.txt file states that maxcpus= limits not the overall number of CPUs, but rather the number that are brought up at boot time. Another look at the console output confirms that in the failing case, eight CPUs are brought up at boot time. However, the other 35 come online some time after boot, sometimes taking a few minutes to come up. Which explains another anomaly I noticed while bisecting, namely that about half the tests ran 30 minutes without failure, but the ones that failed did so within the first five minutes of the run. Apparently the RCU failures are connected somehow to the late arrival of the extra 35 CPUs.

Except that RCU configured itself for the full 43 CPUs, and RCU is supposed to be able to handle CPUs coming and going. In fact, RCU has repeatedly demonstrated its ability to handle CPUs coming and going for more than a decade. So it is time to enable event tracing on a failure scenario (thank you, Steve!). One of the traces shows that there is no RCU callback connected with the first failure, which points the finger of suspicion at RCU expedited grace periods.

A quick inspection of the expedited code shows missing synchronization for the case where a CPU makes its very first appearance just as an expedited grace period starts. Oh, the leaf rcu_node structure's ->lock is held both when updating the number of CPUs that have ever been seen (which is the rcu_state structure's ->ncpus field) and when updating the bitmasks indicating exactly which CPUs have ever been seen (which is the leaf rcu_node structure's ->expmaskinitnext field), but it drops that lock between those two updates.

This means that the expedited grace period might sample the ->ncpus field, notice the change, and therefore check all the ->expmaskinitnext fields—but before those fields had been updated. Not a problem for this grace period, since the new CPUs haven't yet started and thus cannot yet be running any RCU read-side critical sections, which means that there is no reason whatsoever for this grace period to pay any attention to them. However, the next expedited grace period would again sample the ->ncpus field, see no change, and thus not bother checking the ->expmaskinitnext fields. Thus, this grace period would also ignore the new CPUs, which by this time could be very much alive and running RCU read-side critical sections. Hence the too-short grace periods, and hence them showing up within the first few minutes of the run, during the time that the extra 35 CPUs are in the process of coming online.

The fix is easy: Just move the update of ->ncpus to the same critical section as the update of ->expmaskinitnext. With this fix, rcutorture passes the TREE01 scenario even with bogus -smp arguments to qemu. There is therefore once again a bug in rcutorture: There are still bugs in RCU somewhere, and rcutorture is failing to find them!

Strangely enough, I might never have noticed the bug in expedited grace periods had I not made a stupid mistake in the scripting. Sometimes it takes a bug to locate a bug!
It has been more than two years since I posted my last verification challenge, so it is only natural to ask what, if anything, has happened in the meantime. The answer is “Quite a bit!”

I had the privilege of attending The Royal Society Verified Trustworthy Software Systems Meeting, where I was on a panel on “Verification in Industry”. I also attended the follow-on Verified Trustworthy Software Systems Specialist Meeting, where I presented on formal verification and RCU. There were many interesting presentations (see slides 9-12 of this presentation), the most memorable being a grand challenge to apply formal verification to machine-learning systems. If you think that challenge is not all that grand, you should watch this video, which provides an entertaining demonstration of a few of the difficulties.

Closer to home, in the past year there have been three successful applications of automated formal-verification tools to Linux-kernel RCU:

  1. Lihao Liang applied the C Bounded Model Checker (CBMC) to Tree RCU (draft paper). This was a bit of a tour de force, converting Linux-kernel C code (with a bit of manual preprocessing) to a logic expression, then invoking a SAT solver on that expression. The expression's variables correspond to the inputs to the code, the possible multiprocessor scheduling decisions, and the possible memory-model reorderings. The expression evaluates to true if there exists an execution that triggers an assertion. The largest expression had 90 million boolean variables, 450 million clauses, occupied some tens of gigabytes of memory, and, stunningly, was solved with less than 80 hours of CPU time. Pretty amazing considering that SAT is NP complete and that two to the ninety millionth power is an excessively large number!!!
  2. Michalis Kokologiannakis applied Nidhugg to Tree RCU (draft paper). Nidhugg might not make quite as macho an attack on an NP-complete problem as does CBMC, but there is some reason to believe that it can handle larger chunks of the Linux-kernel RCU code.
  3. Lance Roy applied CBMC to Classic SRCU. Interestingly enough, this verification could be carried out completely automatically, without manual preprocessing. This approach is therefore available in my -rcu tree (git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git) on the branch formal.2017.06.07a.


In all three cases, the tools verified portions of RCU and SRCU as correct, and in all three cases, the tools successfully located injected bugs. (Hey, any of us could write a program that consisted of “printf(“Validated/n");”, so you do have to validate the verifier!) And yes, I have given these guys trouble about the fact that their tools didn't find any bugs that I didn't already know about, but these results are nevertheless extremely impressive. Had you told me ten years ago that this would happen, I have no idea how I would have responded, but I most certainly would not have believed you.

In theory, Nidhugg is more scalable but less thorough than CBMC. In practice, it is too early to tell.

So what is the status of the first five verification challenges?


  1. rcu_preempt_offline_tasks(): Still open. That said, Michalis found the infamous RCU bug at Linux-kernel commit 281d150c5f88 and further showed that my analysis of the bug was incorrect, though my fixes did actually fix the bug. So this challenge is still open, but the tools have proven their ability to diagnose rather ornate concurrency bugs.
  2. RCU NO_HZ_FULL_SYSIDLE: Still open. Perhaps less pressing given that it will soon be removed from the kernel, but the challenge still stands!
  3. Apply CBMC to something: This is an ongoing challenge to developers to give CBMC a try on concurrent code. And why not also try Nidhugg?
  4. Tiny RCU: This was a self-directed challenge was “born surmounted”.
  5. Uses of RCU: Lihao, Michalis, and Lance verified some simple RCU uses as part of their work, but this is an ongoing challenge. If you are a formal-verification researcher and really want to prove your tool's mettle, take on the Linux kernel's dcache subsystem!


But enough about the past! Given the progress over the past two years, a new verification challenge is clearly needed!

And this sixth challenge is available on 20 branches whose names start with “Mutation” at https://github.com/paulmckrcu/linux.git. Some of these branches are harmless transformations, but others inject bugs. These bugs range from deterministic failures to concurrent data races to forward-progress failures. Can your tool tell which is which?

If you give any of these a try, please let me know how it goes!
I avoided “The Black Swan” for some years because I was completely unimpressed with the reviews. However, I was sufficiently impressed a recent Nassim Taleb essay to purchase his “Incerto” series. I have read the first two books so far (“Fooled by Randomness” and “The Black Swan”), and highly recommend both of them.

The key point of these two books is that in real life, extremely low-probability events can have extreme effects, and such events are the black swans of the second book's title. This should be well in the realm of common sense: Things like earthquakes, volcanoes, tidal waves, and asteroid strikes should illustrate this point. A follow-on point is that low-probability events are inherently difficult to predict. This also should be non-controversial: The lower the probability, the less the frequency, and thus the less the experience with that event. And of my four examples, we are getting semi-OK at predicting volcanic eruptions (Mt. St. Helens being perhaps the best example of a predicted eruption), not bad at tidal waves (getting this information to those who need it still being a challenge), and hopeless at earthquakes and asteroid strikes.

Taleb then argues that the increasing winner-takes-all nature of our economy increases the frequency and severity of economic black-swan events, in part by rendering normal-distribution-based statistics impotent. If you doubt this point, feel free to review the economic events of the year 2008. He further argues that this process began with the invention of writing, which allowed one person to have an outsized effect on contemporaries and on history. I grant that modern transportation and communication systems can amplify black-swan events in ways that weren't possible in prehistoric times, but would argue that individual prehistoric people had just as much fun with the black swans of the time, including plague, animal attacks, raids by neighboring tribes, changes in the habits of prey, and so on. Nevertheless, I grant Taleb's point that most prehistoric black swans didn't threaten the human race as a whole, at least with the exception of asteroid strikes.

My favorite quote of the book is “As individuals, we should love free markets because operators in them can be as incompetent as they wish.” My favorite question is implied by his surprise that so few people embrace both sexual and economic freedom. Well, ask a stupid question around me and you are likely to get a stupid answer. Here goes: Contraceptives have not been in widespread use for long enough for human natural selection to have taken much account of their existence. Therefore, one should expect the deep subconscious to assume that sexual freedom will produce lots of babies, and that these babies will need care and feeding. Who will pay for this? The usual answer is “everyone” with consequent restrictions on economic freedom. If you don't like this answer, fine, but please consider that it is worth at least what you are paying for it. ;–)

So what does all of this have to do with parallel programming???

As it turns out, quite a lot.

But first, I will also point out my favorite misconception in the book, which is that NP has all that much to do with incomputability. On the other hand, the real surprise is that the trader-philosopher author would say anything at all about them. Furthermore, Taleb would likely point out that in the real world, the distinction between “infeasible to compute” and “impossible to compute” is a distinction without a difference.

The biggest surprise for me personally from these books is that one of the most feared category of bugs, race conditions, are not black-swan bugs, but are instead white-swan bugs. They are quite random, and very amenable to the Gaussian statistical tools that Taleb so rightly denigrates for black-swan situations. You can even do finite amounts of testing and derive good confidence bounds for the reliability of your software—but only with respect to white-swan bugs such as race conditions. So I once again feel lucky to have the privilege of working primarily on race conditions in concurrent code!

What is a black-swan bug? One class of such bugs caused me considerable pain at Sequent in the 1990s. You see, we didn't have many single-CPU systems, and we not infrequently produced software that worked only on systems with at least two CPUs. Arbitrarily large amounts of testing on multi-CPU systems would fail to spot such bugs. And perhaps you have encountered bugs that happened only at specific times in specific states, or as they are sometimes called, “new moon on Tuesdays” bugs.

Taleb talks about using mathematics from fractals to turn some classes of black-swan events into grey-swan events, and something roughly similar can be done with validation. We have an ever-increasing suite of bugs that people have injected in the past, and we can make some statements about how likely someone is to make that same error again. We can then use this experience to guide our testing efforts, as I try to do with the rcutorture test suite. That said, I expect to continue pursuing additional bug-spotting methods, including formal verification. After all, that fact that race conditions are not black swans does not necessarily make them easy, particularly in cases, such as the Linux kernel, where there are billions of users.

In short, ignore the reviews of “Fooled by Randomness” and “The Black Swan”, including this one, and go read the actual books. If you only have time to read one of them, your should of course pick one at random. ;–)
During my keynote at the 2017 Multicore World, Mark Moir asked what I would have done differently if I knew then what I know now, with the “then” presumably being the beginning of the RCU effort back in the early 1990s. Because I got the feeling that my admittedly glib response did not fully satisfy Mark, I figured I should try again. So imagine that you traveled back in time to the very end of the year 1993, not long after Jack Slingwine and I came up with read-copy lock (now read-copy update, or just RCU), and tried to pass on a few facts about my younger self's future. The conversation might have gone something like this:

You  By the year 2017, RCU will be part of the concurrency curriculum at numerous universities and will be very well-regarded in some circles.
Me  Nice! That must mean that DYNIX/ptx will also be doing well!

You  Well, no. DYNIX/ptx will disappear by 2005, being replaced by the combination of IBM's AIX and another operating system kernel started as a hobby.
Me  AIX??? Surely you mean Solaris, HP-UX or Ultrix! And I wouldn't say that BSD started as a hobby! It was after all fully funded research.

You  No, Sun Microsystems was acquired by Oracle in 2010, and Solaris was already in decline by that time. IBM's AIX was by then the last proprietary UNIX operating system standing. A new open-source kernel called "Linux" became the dominant OS.
Me  IBM??? But they are currently laying off more people each month than Sequent employs worldwide!!! Why would they even still be in business in 2010?

You  True. But their new CEO, Louis Gerstner, will turn IBM around.
Me  Well, yes, he did just become IBM's CEO, but before that he was CEO of RJR Nabisco. That should work about as well as John Sculley's tenure as CEO of Apple. What does Gerstner know about computers, anyway?

You  He apparently knew enough to get IBM back on its feet. In fact, IBM will buy Sequent, so that you will become an IBM employee on April 1, 2000.
Me  April Fools day? Now I know you are joking!!!

You  No joke. You will become an IBM employee on April 1, 2000, seven years to the day after Louis Gerstner became an IBM employee.
Me  OK, I guess that explains why DYNIX/ptx doesn't make it past 2005. That is really annoying! So the teaching of RCU in universities is some sort of pity play, then?

You  No. Dipankar Sarma will get RCU accepted into Linux in 2002.
Me  I could easily believe that—he is very capable. So what do I do instead?

You  You will take over maintainership of RCU in 2005.
Me  Is Dipankar going to be OK?

You  Of course! He will just move on to other projects. It is just that there will be a lot more work needed on RCU, which you will take on.
Me  What more work could there be? It is a pretty simple mechanism, way simpler than a memory allocator, for example.

You  Well, there will be quite a bit of scalability work needed. For example, you will receive a scalability bug report involving a 512-CPU shared-mmeory system.
Me  Hmmm... It took Sequent from 1985 to 1997 to get from 30 to 64 CPUs, so that is doubling every 12 years, so I am guessing that I received this bug report somewhere near the year 2019. So what did I do in the meantime?

You  No, you will receive this bug report in 2004.
Me  512-CPU system in 2004??? Well, suspending disbelief, this must be why I will start maintaining RCU in 2005.

You  No, a quick fix will be supplied by a guy named Manfred Spraul, who writes concurrent Linux-kernel code as a hobby. So you didn't do the scalability work until 2008.
Me  Concurrent Linux-kernel coding as a hobby? That sounds unlikely. But never mind. So what did I do between 2005 and 2008? Surely it didn't take me three years to create a highly scalable RCU implementation!

You  You will work with a large group of people adding real-time capabilities to the Linux kernel. You will create an RCU implementation that allowed readers to be preempted.
Me  That makes absolutely no sense! A context switch is a quiescent state, so preempting an RCU read-side critical section would result in a too-short grace period. That most certainly isn't going to help anything, given that a crashed kernel isn't going to offer much in the way of real-time response!

You  I don't know the details, but you will make it work. And this work will be absolutely necessary for the Linux kernel to achieve 20-microsecod interrupt and scheduling latencies.
Me  Given that this is a general-purpose OS, you obviously meant 20 milliseconds!!! But what could RCU possibly be doing that would contribute significantly to a 20-millisecond interrupt/scheduling delay???

You  No, I really did mean sub-20-microsecond latencies. By 2010 or so, even vanilla non-realtime Linux kernel will easily meet 20-millisecond latencies, assuming the hardware and software is properly configured.
Me  Ah, got it! CPU core clock rates should be somewhere around 50GHz by 2010, which might well make those sorts of latencies achievable.

You  No, power-consumption and heat-dissipation constraints will cap CPU core clock frequencies at about 5GHz in 2003. Most systems will run in the 1-3GHz range even as late as in 2017.
Me  Then I don't see how a general-purpose OS could possibly achieve sub-20-microsecond latencies, even on a single-CPU system, which wouldn't have all that much use for RCU.

You  No, this will be on SMP systemss. In fact, in 2012, you will receive a bug report complaining of excessively long 200-microsecond latencies on a system running 4096 CPUs.
Me  Come on! I believe that Amdahl's Law has something to say about lock contention on such large systems, which would rule out reasonable latencies, let alone 200-microsecond latencies! And there would be horrible reliability problems with that many CPUs! You wouldn't be able to keep the system running long enough to measure the latency!!!

You  Hey, I am just telling you what will happen.
Me  OK, so after I get RCU to handle insane scalability and real-time response, there cannot be anything left to do, right?

You  Actually, wrong. Energy efficiency becomes extremely important, and you will rewrite the energy-efficiency RCU code more than eight times before you get it right.
Me  Eight times??? You must be joking!!! Seems like it would be better to just waste a little energy. After all, computers don't consume all that much energy, especially compared to industrial and transportation systems.

You  No, that would not work. By 2005, there are quite a few datacenters that are limited by electrical power rather than by floor space. So much so that large data centers open in Eastern Oregon, on the sites of the old aluminum smelters. When you have that many servers, even a few percent of energy savings translates to millions of dollars a year, which is well worth spending some development effort on.
Me  That is an insanely large number of servers!!! How many Linux instances are running by that time, anyway?

You  By the mid-2010s, the number of Linux instances is well in excess of one billion, but no one knows the exact number.
Me  One billion??? That is almost one server for every family in the world! No way!!!

You  Well, most of the Linux instances are not servers. There are a lot of household appliances running Linux, to say nothing of battery-powered handl-held smartphones. By 2017, most of the smartphones will have multiple CPUs.
Me  Why on earth would you need multiple CPUs to make a phone call? And how would you fit multiple CPUs into a hand-held device? And where do you put the battery, in a large backpack or something???

You  No, the entire device, batteries, CPUs and all, will fit easily into your shirt pocket. And these smartphones can take pictures, record video, do video conference calls, find precise locations using GPS, translate among multiple languages, and much else besides. They are really full-fledged computers that fit in your pocket.
Me  A pocket-sized supercomputer??? And how would I possibly go about testing RCU code sufficiently for your claimed billion instances???

You  Interesting question. You will give a keynote at the 2017 Multicore World in February 2017 at Wellington, New Zealand describing some of your plans. These plans include the use of formal verification in your regression test suite.
Me  Formal verification of highly concurrent code in a regression test suite??? OK, now I know for sure that you are pulling my leg! It has been an interesting conversation, but I must get back to reality!!!


My 1993 self did not have a very accurate view of 2017, did he? As the old saying goes, predictions are hard, especially about the future! So it is quite wise to take such predictions with a considerable supply of salt.
Another year, another release of Is Parallel Programming Hard, And, If So, What Can You Do About It?!

Updates include:



  1. More formatting and build-system improvements, along with many bibliography updates, courtesy of Akira Yokosawa.
  2. A great many grammar and typo fixes from Akira and SeongJae Park.
  3. Numerous changes and fixes from Balbir Singh, Boqun Feng, Mike Rapoport, Praveen Kumar, and Tobias Klauser.
  4. Added code for concurrent skiplists, with the hope for added text in a later release.
  5. Added a running example to the deferred-processing chapter.
  6. Merged “Synchronization Primitives” into “Tools of Trade” section.
  7. Updated control-dependency discussion in memory-barriers section.
As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.