Log in

No account? Create an account
TL;DR: Do you know of additional publicly visible production uses of sequnce locking, hazard pointers, or RCU not already called out in the remainder of this blog post?

I am updating the deferred-processing chapter of “Is Parallel Programming Hard, And, If So, What Can You Do About It?” and would like to include a list of publicly visible production uses of sequence locking, hazard pointers, and RCU. I suppose I could also include reference counting, but given that it was well known before I was born, I expect that its list would be way too long to be useful!

The only production use of sequence locking that I am aware of is within the Linux kernel, but I would be surprised if it is not rather widely used. Can you tell me of more publicly visible production sequence-locking uses?

Hazard pointers is used within MongoDB (v3.0 and later) and within Facebook's Folly library, which is used in production at Facebook and perhaps elsewhere as well. It is also implemented by several libraries called out on its Wikipedia page (Concurrent Building Blocks, Concurrency Kit, Atomic Ptr Plus, and libcds). Hazard pointers is also sometimes called “safe memory reclamation” (SMR). Any other production hazard-pointers uses?

RCU is used within the Linux kernel, the FreeBSD kernel, the OpenBSD kernel, Linux Trace Toolkit Next Generation (LTTng), QEMU, Knot DNS, Netsniff-ng, Sheepdog, GlusterFS, and gdnsd. It is also implemented by several libraries, including Userspace RCU, Concurrency Kit, Facebook's Folly library, and libcds. RCU is also called “epochs” (from Keir Fraser), “generations” (from Tornado/K42), “passive serialization” (from IBM zVM), and probably other things as well. Any other production RCU uses?

So what do I mean by “publicly visible”? Open-source projects should qualify, as should scholarly publications regarding proprietary projects. Similarly, “production use” means use for getting some job done, as opposed to research, prototyping, or benchmarking. Not that there is necessarily anything wrong with research, prototyping, or benchmarking, but we are looking for things a little bit further along the hype cycle. ;-)
There has been much ink spilled about innovation over the past decades, but this article from Harvard Business Review is the first one that really rings true with my experiences. The main point of this article is that much prior writing has focused on the fun aspects of innovation, and points out some additional hard work that is absolutely required for meaningful innovation. The authors put forth five maxims, each of which is discussed below.

Tolerance for failure but no tolerance for incompetence. This maxim is the one that rings most true with me: Innovation's progress is often measured in errors per hour, but the errors have to be productive errors that either eliminate classes of potential solutions from consideration or that better approximate a useful solution. And in my experience, extreme competence is required to make the right mistakes, that is, the mistakes that will generate the experience required to eventually arrive at a workable solution.

However, this maxim is also the one that I am most uncomfortable with. The discomfort stems from the choice of the word “incompetence”. After all, what is incompetence? The old apprentice/journeyman/master trichotomy is a useful guide. An apprentice is expected to do useful work if overseen by a journeyman or master. A journeyman is expected to be capable of carrying out a wide range of tasks without guidance. A master is expected to be able to extend the state of the art as needed to complete the task at hand. Clearly, there is a wide gulf between the definition of “incompetence” appropriate for an apprentice on the one hand and a master on the other. The level of competence required for this sort of work is not a function of education, certifications, or seniority, but instead requires a wide range of deep skills and experience combined with a willingness to learn things the hard way, along with a tolerance for the confusion and disorder that usually accompanies innovation. In short, successful innovation requires the team have a fair complement of masters. Yet it makes absolutely no sense to label as “incompetent” an accomplished journeyman, even if said journeyman is a bit uncreative and disorder-intolerant.

All that aside, “Tolerance for failure but no tolerance for non-mastery” doesn't exactly roll off the tongue, and besides which, large projects would have ample room for apprentices and journeymen, for example, our hypothetical accomplished but disorder-intolerant journeyman might be an excellent source of feedback. And in fact, master-only teams tend to be quite small [PDF, paywalled, sorry!]. I therefore have no suggestions for improvement. And wording quibbles aside, this maxim seems to me to be the most important of the five by far.

Willingness to experiment but highly disciplined. Although it is true that sometimes the only way forward is a random walk, it is still critically important to keep careful records of the experiments and their outcomes. It is often the case that last week's complete and utter failure turns out to contain the seeds of this week's step towards success, and sometimes patterns within a depressing morass of failures point the way to eventual success. The article also makes the excellent point that stress-testing ideas early on avoids over-investing in the inevitable blind alleys.

Psychologically safe but brutally candid. We all fall in love with our ideas, and therefore we all need the occasional round of “frank and open” feedback. If nothing else, we should design our experiments (or, in software, our validation suites) to provide that feedback.

Collaboration but with individual accountability. Innovation often requires that individuals and teams buck the common wisdom, but common wisdom often carries the day. Therefore, those individuals and teams must remain open to feedback, and accountability is one good way to help them seek out feedback and take that feedback seriously.

Flat but strong leadership. Most of my innovation has been carried out by very small teams, so this maxim has not been an issue for me. But people wishing to create large but highly innovative teams would do well to read this part of the article very carefully.

In short, this is a great article, and to the best of my knowledge the first one presenting both the fun and hard-work sides of the process of innovation. Highly recommended!


Parallel Programming: December 2018 Update

This weekend features a new release of Is Parallel Programming Hard, And, If So, What Can You Do About It?.

This release features Makefile-automated running of litmus tests (both with herd and litmus tools), catch-ups with recent Linux-kernel changes, a great many consistent-style changes (including a new style-guide appendix), improved code cross-referencing, and a great many proofreading changes, all courtesy of Akira Yokosawa. SeongJae Park, Imre Palik, Junchang Wang, and Nicholas Krause also contributed much-appreciated improvements and fixes. This release also features numerous epigraphs, modernization of sample code, many random updates, and larger updates to the memory-ordering chapter, with much help from my LKMM partners in crime, whose names are now enshrined in the LKMM section of the Linux-kernel MAINTAINERS file.

As always, git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git will be updated in real time.

Oh, and the first edition is now available on Amazon in English as well as Chinese. I have no idea how this came about, but there it is!
Antifragile” was the last volume in Nassim Taleb's Incerto series, but it has lost that distinction with the publication of “Skin in the Game: Hidden Asymmetries in Daily Life”. This book covers a great many topics, but I will focus on only a few that relate most closely to my area of expertise.

Chapter 2 is titled “The Most Intolerant Wins: The Dominance of a Stubborn Minority”. Examples include kosher and halal food, the English language (I plead guilty!!!), and many others besides. In all cases, if the majority is not overly inconvenienced by the strongly expressed needs or desires of the minority, the minority's preferences will prevail. On the one hand, I have no problem eating either kosher or halal food, so would be part of the compliant majority in that case. On the other hand, although I know bits and pieces of several languages, the only one I am fluent in is English, and I have attended gatherings where the language was English solely for my benefit. But there are limits. For example, if I were to attend a gathering in certain parts of (say) rural India or China, English might not be within the realm of possibility.

But what does this have to do with parallel programming???

This same stubborn-minority dominance appears in software, including RCU. Very few machines have more than a few tens of CPUs, but RCU is designed to accommodate thousands. Very few systems run workloads featuring aggressive real-time requirements, but RCU is designed to support low latencies (and even more so the variant of RCU present in the -rt patchset). Very few systems allow physical removal of CPUs while the systems is running, but RCU is designed to support that as well. Of course, as with human stubborn minorities, there are limits. RCU handles systems with a few thousand CPUs, but probably would not do all that well on a system with a few million CPUs. RCU supports deep sub-millisecond real-time latencies, but not sub-microsecond latencies. RCU supports controlled removal and insertion of CPUs, but not surprise removal or insertion.

Chapter 6 is titled Intellectual Yet Idiot (with the entertaining subtext “Teach a professor how to deadlift”), and, as might be expected from the title, takes a fair number of respected intellectual to task, for but two examples, Cass Sunstein and Richard Thaler. I did find the style of this chapter a bit off-putting, but I happened to read Michael Lewis's “The Undoing Project” at about the same time. This informative and entertaining book covers the work of Daniel Kahneman and Amos Tversky (whose work helped to inform that of Sunstein and Thaler), but I found the loss-aversion experiments to be unsettling. After all, what does losing (say) $100 really mean? That I will be sad for a bit? That I won't be able to buy that new book I was looking forward to reading? That I don't get to eat dinner tonight? That I go hungry for a week? That I starve to death? I just might give a very different answer in these different scenarios, mightn't I?

This topic is also covered by Jared Diamond in his most excellent book entitled “The World Until Yesterday”. In the “Scatter your land” section, Diamond discusses how traditional farmers plant multiple small and widely separated plots of land. This practice puzzled anthropologists for some time, as it does the opposite of optimize yields and minimize effort. Someone eventually figured out that because these traditional farmers had no way to preserve food and limited opportunities to trade it, there was no value in producing more food than they could consume. But there was value in avoiding a year in which there was no food, and farming different crops in widely separated locations greatly decreased the odds that all their crops in all their plots would fail, thus in turn minimizing the probability of starvation. In short, these farmers were not optimizing for maximum average production, but rather for maximum probability of survival.

And this tradeoff is central to most of Taleb's work to date, including “Skin in the Game”.

But what does this have to do with parallel programming???

Quite a bit, as it turns out. In theory, RCU should just run its state machine and be happy. In practice, there are all kinds of things that can stall its state machine, ranging from indefinitely preempted readers to long-running kernel threads refusing to give up the CPU to who knows what all else. RCU therefore contains numerous forward-progress checks that reduce performance slightly but which also allow RCU to continue working when the going gets rough. This sort of thing is baked even more deeply into the physical engineering disciplines in the form of the fabled engineering factor of safety. For example, a bridge might be designed to handle three times the heaviest conceivable load, thus perhaps surviving a black-swan event such as a larger-than-expected earthquake or tidal wave.

Returning to Skin in the Game, Taleb makes much of the increased quality of decisions when the decider is directly affected by them, and rightly so. However, I became uneasy about cases where the decision and effect are widely separated in time. Taleb does touch obliquely on this topic in a section entitled “How to Put Skin in the Game of Suicide Bombers”, but does not address this topic in more prosaic settings. One could take a survival-based approach, arguing that tomorrow matters not unless you survive today, but in the absence of a very big black swan, a large fraction of the people alive today will still be alive ten years from now.

But what does this have to do with parallel programming???

There is a rather interesting connection, especially when you consider that Linux-kernel RCU's useful lifespan probably exceeds my own. This is not a new thought, and is in fact why I have put so much energy into speaking and writing about RCU. I also try my best to make RCU able to stand up to whatever comes its way, with varying degrees of success over the years.

However, beyond a certain point, this practice is labeled “overengineering”, which is looked down upon within the Linux kernel community. And with good reason: Many of the troubles one might foresee will never happen, and so the extra complexity added to deal with those troubles will provide nothing but headaches for no benefit. In short, my best strategy is to help make sure that there are bright, capable, and motivated people to look after RCU after I am gone. I therefore intend to continue writing and speaking about RCU. :–)


My return to the IBM mainframe was delayed by my high school's acquisition of a a teletype connected via a 110-baud serial line to a timesharing system featuring the BASIC language. I was quite impressed with this teletype because it could type quite a bit faster than I could. But this is not as good as it might sound, given that I came in dead last in every test of manual dexterity that the school ever ran us through. In fact, on a good day, I might have been able to type 20 words a minute, and it took decades of constant practice to eventually get above 70 words a minute. In contrast, one of the teachers could type 160 words a minute, more than half again faster than the teletype could!

Aside from output speed, I remained unimpressed with computers compared to paper and pencil, let alone compared to my pocket calculator. And given that this was old-school BASIC, there was much to be unimpressed about. You could name your arrays anything you wanted, as long as that name was a single upper-case character. Similarly, you could name your scalar variables anything you wanted, as long as that name was either a single upper-case character or a single upper-case character followed by a single digit. This allowed you to use up to 286 variables, up to 26 of which could be arrays. If you felt that GOTO was harmful, too bad. If you wanted a while loop, you could make one out of IF statements. Not only did IF statements have no else clause, the only thing that could be in the THEN clause was the number of the line to which control would transfer when the IF condition evaluated to true. And each line had to be numbered, and the numbers had to be monotonically increasing, that is, in the absence of control-flow statements, the program would execute the lines of code in numerical order, regardless of the order in which you typed those lines of code. Definitely a step down, even from FORTRAN.

But then the teacher showed the class a documentary movie showing several problems that could be solved by computer. I was unimpressed by most of the problems: Printing out prime numbers was impressive but pointless, and maximizing the volume of a box given limited materials was a simple pencil-and-paper exercise in calculus. But the finite-element analysis fluid-flow problem did get my attention. This featured a rectangular aquarium with a glass divider, so that initially the right-hand half of the aquarium was full of water and the left-hand half was full of air. They abruptly removed the glass divider, causing the water to slosh back and forth. They then showed a video of a computer simulation of the water flow, which matched the actual water flow quite well. There was no way I could imagine doing anything like that by hand, and was thus inspired to continue studying computer programming.

We students therefore searched out things that the computer could do that we were unwilling or unable to. One of my classmates ran the teletype's punch-tape output through its punch-tape reader, thus giving us all great insight as to why teletypes on television shows appeared to be so busy. For some reason, our teacher felt that this project was a waste of both punched tape and paper. He was more impressed with the work of another classmate, who calculated and ASCII-art printed magnetic lines of force. Despite the teletype's use of eight-bit ASCII, its print head was quite innocent of lower-case characters.

I coded up a project that plotted the zeroes of functions of two variables as ASCII art on the teletype. My teacher expressed some disappointment in my brute-force approach to locating the zeroes, but as far as I could see the bottleneck was the teletype, not the CPU. Besides, the timesharing service charged only for connect time, so CPU time was free, and why conserve a zero-cost resource?

I worked around the computer's limited arithmetic using crude multi-precision code with the goal of computing one thousand factorial. In this case, CPU was definitely the bottleneck, especially given my naive multiplication algorithm. The largest timeslot I could reserve on the teletype was an hour, and during that time, the computer was only able to make it to 659 factorial. In contrast, Maxima takes a few tens of milliseconds to compute 1000 factorial on my laptop. What a difference four decades makes!

I wrote my first professional program on this computer, a pro bono effort for a charity fundraiser. This charity was the work of the local branch of the National Honor Society, and the fundraiser was a computer-dating dance. Given that I was 160 pounds (73 kilograms) of computer-geeky social plutonium, I felt the need to consult an expert. The expert I chose was the home-economics teacher, who unfortunately seemed much more interested in working out why I was such a hopeless geek than in helping with matching criteria. I nevertheless extracted sufficient information to construct a simple Hamming-distance matcher. Fortunately most people seemed reasonably satisfied with their computer-chosen dance partners, the most notable exception being a senior girl who objected strenuously to having been matched only with freshmen boys. Further investigation determined that this mismatch was due to a data-entry error. Apparently, even Cupid is subject to Murphy's Law.

I also did my one and (thus far) only stint of white-hat hacking. In those trusting times, the school-administration software printed the user's password in cleartext as it was typed. But it was not necessary to memorize the characters that the user typed. You see, this teletype had what is called a ``HERE IS'' key. When this key was pressed, the teletype would send a 20-character sequence recorded on a mechanical drum located inside the teletype. And the sequence recorded on this particular teletype's mechanical drum was, you guessed it, the password to the school-administration software. I demonstrated this to my teacher, which resulted in the teletype being under continuous guard by a school official until such time as the mechanical drum could be replaced with one containing 20 ASCII NUL characters. (And here you thought that security theater was a recent phenomenon!)

Despite its limitations, my two years with this system were quite entertaining and educational. But then it was time to move on to college.
For the first couple of decades of my life, computers as we know them today were exotic beasts that filled rooms, each requiring the care of a cadre of what were then called systems programmers. Therefore, in my single-digit years the closest thing to a computer that I laid my hands on was a typewriter-sized electromechanical calculator that did addition, subtraction, multiplication, and division. I had the privilege of using this captivating device when helping out with accounting at the small firm at which my mother and father worked.

I was an early fan of hand-held computing devices. In fact, I was in the last math class in my high school that was required to master a slide rule, of which I still have several. I also learned how to use an abacus, including not only addition and subtraction, but multiplication and division as well. Finally, I had the privilege of living through the advent of the electronic pocket calculator. My first pocket calculator was a TI SR-50, which put me firmly on the infix side of the ensuing infix/Polish religious wars.

But none of these qualified as “real computers”.

Unusually for an early 1970s rural-Oregon high school, mine offered computer programming courses. About the only thing I knew about computers were that they would be important in the future, so I signed up. Even more unusually for that time and place, we got to use a real computer, namely an IBM 360. This room-filling monster was located fourteen miles (23 kilometers) away at Chemeketa Community College. As far as I know, this was the closest computer to my home and school. Somehow my math teacher managed to wangle use of this machine on Tuesday and Thursday evenings, and he bussed us there and back.

This computer used punched cards and a state-of-the-art chain lineprinter. We were allowed to feed the card reader ourselves, but operating the lineprinter required special training. This machine's console had an attractive red button labeled EMERGENCY PULL. The computer's operator, who would later distinguish himself by creating a full-motion video on an Apple II, quite emphatically stated that this button should be pulled only in case of a bona fide emergency. He also gave us a simple definition of “emergency” that featured flames shooting out of the top of the computer. I never did see any flames anywhere near the computer, much less shooting out of its top, so I never had occasion to pull that button. But perhaps the manufacturers of certain incendiary laptops should have equipped each of them with an attractive red EMERGENCY PULL button.

Having provided us the necessary hardware training, the operator then gave us a sample card deck. We were to put our program at one specific spot in the deck, and our input data in another. Those of us wishing more information about how this worked were directed to an impressively large JCL manual.

The language of the class was FORTRAN, except that FORTRAN was deemed to difficult an initial language for our tender high-school minds. They therefore warmed us up with assembly language. Not IBM's celebrated Basic Assembly Language (BAL), but a simulated assembly language featuring base-10 arithmetic. After a couple of sessions with the simulated assembly, we moved up to FORTRAN, and even used PL/1 for one of our assignments. There were no error messages: There were instead error numbers that you looked up in a thick printed manual located in the same bookcase containing the JCL manual.

I was surprised by the computer's limitations, especially the 6-to-7 digit limits for single-precision floating point. After all, even my TI SR-50 pocket calculator did ten digits! That said, the computer could also do alphabetic characters (but only upper case) and a few symbols—though the exclamation point was notably missing. The state-of-the-art 029 keypunches were happy to punch an exclamation mark, but alas! It printed as “0” (zero) on the lineprinter.

I must confess that I was not impressed with the computer. In addition to its arithmetic limitations, its memory was quite small. Most of our assignments were small exercises in arithmetic that I could complete much more quickly using paper and pencil. In retrospect, this is not too surprising, given that my early laissez-faire programming methodology invariably resulted in interminable debugging sessions. However, it was quite clear that computers were becoming increasingly important, and I therefore resolved to take the class again the following year.

So, the last time I walked out of that machine room in Spring of 1974, I fully expected to walk back the following Fall. Little did I know that it would be almost 30 years before I would once again write code for an IBM mainframe. Nor did I suspect that it would be more than 15 years before work started on the operating system that was to be running on that 30-years-hence mainframe.

My limited foresight notwithstanding, somewhere in Finland a small boy was growing up.
Core counts keep rising, and that means that the Linux kernel continues to encounter interesting performance and scalability issues. Which is not a bad thing, since it has been fifteen years since the ``free lunch'' of exponential CPU-clock frequency increases came to an abrupt end. During that time, the number of hardware threads per socket has risen sharply, approaching 100 for some high-end implementations. In addition, there is much more to scaling than simply larger numbers of CPUs.

Proposed topics for this microconference include optimizations for mmap_sem range locking; clearly defining what mmap_sem protects; scalability of page allocation, zone->lock, and lru_lock; swap scalability; variable hotpatching (self-modifying code!); multithreading kernel work; improved workqueue interaction with CPU hotplug events; proper (and optimized) cgroup accounting for workqueue threads; and automatically scaling the threshold values for per-CPU counters.

We are also accepting additional topics. In particular, we are curious to hear about real-world bottlenecks that people are running into, as well as scalability work-in-progress that needs face-to-face discussion.

We hope to see you there!
Testing, fuzzing, and other diagnostics have greatly increased the robustness of the Linux ecosystem, but embarrassing bugs still escape to end users. Furthermore, a million-year bug would happen several tens of times per day across Linux's installed base (said to number more than 20 billion), so the best we can possibly do is hardly good enough.

The Testing and Fuzzing Microconference intends to raise the bar with further progress on syzbot/syzkaller, distribution/stable testing, kernel continuous integration, and unit testing. The best evidence of progress in these efforts will of course be the plethora of bug reports produced by these and similar tools!

Join us for an important and spirited discussion!
We are pleased to announce that the RDMA Microconference has been accepted into the 2018 Linux Plumbers Conference!

RDMA (remote direct memory access) is a well-established technology that is used in environments requiring both maximum throughputs and minimum latencies. For a long time, this technology was used primary in high-performance computing, high frequency trading, and supercomputing. For example, the three most powerful computers are based on Linux and RDMA (in the guise of Infiniband).

However, the latest trends in cloud computing (more bandwidth at larger scales) and storage (more IOPS) makes RDMA increasingly important outside of its initial niches. Therefore, clean integration between RDMA and various kernel susbsystems is paramount. We are therefore looking to build on previous years' successful RDMA microconferences, this year discussing our 2018-2019 plans and roadmap.

Topics proposed for this year's event include the interaction between RDMA and DAX (direct access for files), how to solve the get_user_pages() problem (see https://lwn.net/Articles/753027/ and https://lwn.net/Articles/753272/), IOMMU and PCI-E issues, continuous integration, python integration, and Syzkaller testing.
There was a time when I felt that Linux-kernel RCU was too low-level to possibly be the subject of a security exploit, but Rowhammer put paid to that naive notion. And it finally happened earlier this year. Now, I could claim that I did nothing wrong. After all, RCU worked as advertised. The issue was instead that RCU has multiple flavors:


  1. RCU-bh for code that is subject to network-based denial-of-service attacks.
  2. RCU-sched for code that must interact with interrupt/NMI handlers or with preemption-disabled regions of code, and for general-purpose use in CONFIG_PREEMPT=n kernels.
  3. RCU-preempt for general-purpose use in CONFIG_PREEMPT=y kernels.

The real problem was that someone used one flavor in one part of their RCU algorithm, and another flavor in another part. This has roughly the same effect on your kernel's health and well-being as does acquiring the wrong lock. And, as luck would have it, the resulting bug proved to be exploitable. To his credit, Linus Torvalds noted that having multiple RCU flavors was a root cause, and so he asked that I do something to prevent future similar security-exploitable confusion. After some discussion, it was decided that I try to merge the three flavors of RCU into “one flavor to rule them all”.

Which I have now done in the relative privacy of my -rcu git tree (as in “git clone https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git” followed by “git checkout dev”).

So what has this got to do with validation in general or formal verification in particular?

Just this: Over the past few months, I have taken a meataxe to Linux-kernel RCU, which implies the injection of any number of bugs. If you would like your formal-verification tool/methodology to be the first to find a bug in Linux-kernel RCU that I don't already know about, this would be an excellent time to give it a try. And yes, all those qualifiers are necessary, as several groups have used formal-verification tools to find bugs in Linux-kernel RCU that I did already know about.

More generally, given the large number of swings I took with said meataxe, if your formal verification tool cannot find bugs in the current dev version of RCU, you might need to entertain the possibility that your formal verification tool cannot find bugs!

Latest Month

March 2019


RSS Atom
Powered by LiveJournal.com
Designed by Tiffany Chow