You are viewing paulmck

Previous Entry | Next Entry

The past year has been a busy one for transactional memory (TM). IBM's Blue Gene/Q and Intel's Haswell TSX have joined Sun Rock and Azul's Vega 2 in offering hardware transactional memory (HTM). All of these implementations seem to be focusing on TM in the small, wisely in my opinion, given TM's weak support for many programming-in-the-large use cases, particularly I/O and system calls. A number of TM researchers seem to be coming around to this view, as evidenced by slide 279 of these lecture slides. All of this is to the good, especially given some of the unrealistically high expectations that some people have harbored for TM.

So what do these HTM implementations offer? Perhaps it can best be thought of as an extension of LL/SC or CAS that operate on more than one memory location, with the number being limited by hardware considerations such as cache geometry. As with both LL/SC and CAS, TM can fail, both due to conflicts between concurrently executing transactions and because of hardware events such as exceptions and cache-line conflicts. Interestingly enough, this means that in many cases, hardware transactions must have some other synchronization mechanism (such as locking!) as a fallback. Unfortunately, this means that you cannot count on HTM to simplify deadlock cycles. However, it might well increase performance for some common types of large non-statically-partitionable data structures subject to smallish updates. The canonical example of such a data structure seems to be red-black trees.

One thing that disappoints me personally about all of these mechanisms is their limited support for RCU. Don't get me wrong, you can use RCU with HTM updates. However, bare RCU readers will typically cause HTM updaters to unconditionally abort. Although RCU's purpose is in fact to favor readers, this is entirely too much of a good thing. For ideas for improved interactions between TM and RCU, please see an earlier posting. It is also unclear to what extent HTM helps in cases involving severe real-time-response constraints.

There have also been some interesting software transactional memory (STM) papers. Pankratius and Adl-Tabatabai published a productivity comparison in which the highest productivity was attained by teams that used both STM and locking. Although one can argue that inevitable transactions will subsume the uses of locking in the Pankratius paper, inevitable transactions are normally implemented as a global lock. sharply limiting I/O scalability.

Dragojevic et al. published what is often viewed as a riposte to Cascaval et al.'s Software Transactional Memory: Why Is It Only a Research Toy? in the CACM article entitled Why STM can be more than a Research Toy (see here for a related technical report). The Dragojevic article has rare graphs showing semi-reasonable scalability, with up to 12 times sequential execution rate on 16 CPUs, which is certainly not splendid, but neither is it altogether embarrassing. However, that speedup is the best results from a set of 17 benchmarks, many of which show negative scalability.

But a closer reading shows that Dragojevic's and Cascaval's ideas of exactly what constitutes “software transactional memory” are quite different. Dragojevic's results use a variant of STM that takes hints from the programmer in order to indicate which data is shared or not. This is an interesting hack, and it is hard to argue with the improvement over straight STM, but it does seem to back away from the earlier TM promises of transparent parallelism.

These hints raise some interesting possibilities. Suppose that the programmer also hinted that certain pairs of transactions could never conflict due to accessing disjoint data. One could take this a step further and use names for different disjoint subsets of the data. Such hints could greatly reduce or even remove the need for complex contention managers and associated aborts and rollbacks. But the best thing about this sort of scheme is that it has seen heavy production use with good performance and scalability extending back for decades. It is called “locking”. ;-)


Feb. 12th, 2012 04:03 am (UTC)
HTM Lock elision will still make deadlock avoidance easier because you can hopefully get away with more coarse grained locks. The less locks you have the easier deadlock avoidance is.

A simple implementation may just use a single lock for the whole program as fallback, which by definition has no deadlock potential.

Of course the devil in the details, as in lock regions that frequently or always abort may still need more locks. But it may be good enough

Andi Kleen
Feb. 12th, 2012 05:21 am (UTC)
Re: deadlocks
In some cases, you are quite likely correct. That said, it would have been nice to be able to introduce atomic data structures to simplify programs with more ornate locking designs, such as the Linux kernel.

Can't have everything, I guess! ;-)