You are viewing paulmck

Previous Entry | Next Entry

I had the good fortune to be invited to the Multi-Core Memory Models and Concurrency Theory workshop in Dagstuhl last week. The venue was quite impressive, set in rural Germany in a centuries-old chapel that has been expanded to a full fledged conference center. This workshop brought together a group ranging from concurrency theory to compiler research to software hackers (such as myself) to hardware designers. This was a great opportunity to meet a number of people for the first time and it was also good to renew a number of acquaintances. This posting reviews but a few of the conversations and presentations at the workshop.

Doug Lea is a leading light in Java concurrency. Comparing notes with Doug made me realize just how lucky I am to be working in C on the Linux kernel. Doug must invest considerable time and energy into working around interesting limitations in JVMs and JITs, over and above the task of outsmarting the hardware.

Richard Bornat has long been an icon in the concurrency-theory community, but what impressed me was his call for applying proofs of correctness to the code that programmers really write as they write it. He then proceeded to give Linux kernel code as an example. His approach is a refreshing change from the traditional approach of rewriting the code in some other language (in my case, Promela) and hoping that nothing is lost in translation. It will be quite interesting to see if Richard and his colleagues are able to pull this off. Cynics will no doubt note that because Richard is retired, it is his colleagues who must deliver. ;–)

Sarita Adve and Mark D. Hill are prominent academic hardware architects. It was quite interesting to listen to them debate whether the combination of high scalability and stringent energy-efficiency constraints would force a trend towards weakly ordered systems. However, the resolution of this argument rests in the hands of Stephan Diestelhorst (representing AMD's relatively strongly ordered x86 systems), Richard Grisenthwaite (representing ARM's weakly ordered systems), and Derek Williams (representing IBM's weakly ordered Power systems). Interestingly enough, a number of people attending the workshop wanted sequentially consistent hardware (which would be even more strongly ordered than x86!), but were more than willing to allow compilers to reorder accesses with wild abandon. Only one of the attendees had a rationale for this seeming inconsistency, and that rationale was that there is a dearth of reliable formal memory-ordering models for commodity microprocessors.

It is therefore quite fortunate (or not, depending on your viewpoint) that Peter Sewell, Susmit Sarkar, and a number of their students (represented by Mark Batty, Jade Algave, and Luc Mangaret) are working to establish formal models for the x86, ARM, and PowerPC memory models. They are using a combination of experimentation, perusal of available reference manuals, and discussions with the relevant hardware architects.

Erik Hagersten (of cache-only-memory-architecture (COMA) fame) gave an excellent talk highlighting exactly what sorts of costs hardware designers would and would not be willing to accept. To give you some idea, hardware designers seem to be about as happy to accept extra bits in cache lines as kernel hackers would be to accept global lock acquisitions. Seemingly small increases in transitor count can have serious side effects, because the values held in those transistors must be communicated and operated upon, which can incur surprisingly large energy-efficiency and size penalties.

Sarita Adve, Hans Boehm, Mark Batty, and I worked through a few “interesting” corner cases in the upcoming C++ standard, and I of course ended up explaining RCU to quite a few academics. For the most part, they took it well. A few even requested the explanation of RCU!

Maged Michael and I had a some very interesting discussions on his paper entitled “Laws of Order”, but the paper and discussion merits its own posting.

One of the academics who had been aware of RCU beforehand described it as “beautiful”. This came as rather a shock, actually, albeit a very pleasant one! ;–)


Jan. 18th, 2011 07:36 am (UTC)
Formal verification
> He then proceeded to give Linux kernel code as an example.

That is exactly the idea behind Relacy Race Detector:
It must verify any piece of real production code, because, well, in the end only that makes sense for me as an engineer.

> are working to establish formal models for the x86, ARM, and PowerPC memory models.

Why not take the following route?
Establish formal model for C1x/C++0x (it's already mostly done). Carefully map it to x86, ARM, and PowerPC only once. Let people write in C1x/C++0x. Create verifiers for C1x/C++0x.
And their approach kind of implies that people will manually write their software for x86, ARM, and PowerPC in assembly??? Or there are problems even with mapping C1x/C++0x to real hardware w/o formal models?
Jan. 18th, 2011 03:46 pm (UTC)
Re: Formal verification
It doesn't really matter *what* the C/C++ model says, if it doesn't sensibly match what hardware actually *does*. While people won't manually write much software in assembly, compilers and operating systems have to care about the assembly-level behavior. And if C1x's memory model doesn't offer a "behave exactly like the hardware" mode, then OSes written in C will make sure that C compilers have an "ignore the insane C1x model" mode.
Jan. 18th, 2011 03:54 pm (UTC)
Re: Formal verification
> if it doesn't sensibly match what hardware actually *does*

Is there any indication that it does not sensibly match what hardware actually does? As far as I see, for 99% of developers it does.

> And if C1x's memory model doesn't offer a "behave exactly like the hardware" mode

If extended with a few additional primitives, it does.
Jan. 18th, 2011 06:53 pm (UTC)
Re: Formal verification
We need both.

The C/C++ memory model suffices for a great many developers, particularly those that need to write portable code. The models for the individual systems are critically important for compiler/interpreter/JIT developers, as well as for developers writing low-level architecture-dependent code.