You are viewing paulmck

Previous Entry | Next Entry

SequentialCaveman
Imagine a software project that has been written and maintained for many years as a single-threaded program. In such an environment, using a singleton object to hand out consecutive transaction IDs is the most straightforward thing in the world: simple, fast, and easily encapsulated. Similarly, a singleton object is also quite attractive as a central “mailbox” or communications hub. Again, simple, fast, and easily encapsulated.

That is, these approaches are attractive only until it is time to run on a multicore system, at which point their poor scalability will become painfully apparent. The following figure illustrates this point, showing the overhead of concurrent atomic increment of a single global variable in a tight loop.

Atomic increment does not scale well

Perfect scalability would result in a horizontal line in the above figure. The upward-slanting line shows that the more CPUs we add, the slower atomic increment gets. This is the antithesis of good scalability.

Of course, if the transaction ID was used only as a tag, there are a number of easy fixes, for example, assigning ranges of numbers to individual threads so that they can generate transaction IDs without the need for costly atomic operations. But what if there is a recovery mechanism that absolutely requires that the IDs be assigned consecutively? This might require a complete rewrite of what is usually extremely tricky code.

Worse yet, suppose that this transaction ID was part of some heavily used API. Modifications might then be required in many pieces of code throughout many organizations. It is likely that no one person would be able to find all the relevant code within the confines of a single organization, let alone across multiple organizations.

Oops!!!

One can argue that this is not parallelism's fault. After all, parallelism did not hold a gun to anyone's head and insist that poor design choices be made. But this is cold comfort to anyone tasked with parallelizing such software. Besides which, in absence of parallelism, these design choices were eminently reasonable.

That said, the figure above shows that atomic increment consumes less than 400 nanoseconds, even on a 16-CPU system. In many cases, this 400 nanoseconds is not a large overhead. So, is the use of global singletons really a problem for parallel code, and if so, what can be done about it?

Comments

(Anonymous)
Feb. 13th, 2010 11:36 pm (UTC)
Typos
Here's my pseudo patch:
-The good new is that
+The good news is that

paulmck
Feb. 14th, 2010 08:12 am (UTC)
Re: Typos
Good eyes, fixed!