You are viewing paulmck

Previous Entry | Next Entry

SequentialCaveman
Imagine a software project that has been written and maintained for many years as a single-threaded program. In such an environment, using a singleton object to hand out consecutive transaction IDs is the most straightforward thing in the world: simple, fast, and easily encapsulated. Similarly, a singleton object is also quite attractive as a central “mailbox” or communications hub. Again, simple, fast, and easily encapsulated.

That is, these approaches are attractive only until it is time to run on a multicore system, at which point their poor scalability will become painfully apparent. The following figure illustrates this point, showing the overhead of concurrent atomic increment of a single global variable in a tight loop.

Atomic increment does not scale well

Perfect scalability would result in a horizontal line in the above figure. The upward-slanting line shows that the more CPUs we add, the slower atomic increment gets. This is the antithesis of good scalability.

Of course, if the transaction ID was used only as a tag, there are a number of easy fixes, for example, assigning ranges of numbers to individual threads so that they can generate transaction IDs without the need for costly atomic operations. But what if there is a recovery mechanism that absolutely requires that the IDs be assigned consecutively? This might require a complete rewrite of what is usually extremely tricky code.

Worse yet, suppose that this transaction ID was part of some heavily used API. Modifications might then be required in many pieces of code throughout many organizations. It is likely that no one person would be able to find all the relevant code within the confines of a single organization, let alone across multiple organizations.

Oops!!!

One can argue that this is not parallelism's fault. After all, parallelism did not hold a gun to anyone's head and insist that poor design choices be made. But this is cold comfort to anyone tasked with parallelizing such software. Besides which, in absence of parallelism, these design choices were eminently reasonable.

That said, the figure above shows that atomic increment consumes less than 400 nanoseconds, even on a 16-CPU system. In many cases, this 400 nanoseconds is not a large overhead. So, is the use of global singletons really a problem for parallel code, and if so, what can be done about it?

Comments

(Anonymous)
Dec. 24th, 2009 04:47 am (UTC)
timestamp
> But what if there is a recovery mechanism that absolutely requires > that the IDs be assigned consecutively? High-resolution Timer. Time-stamp + ranges could help a bit? It is not serialization but...
paulmck
Dec. 24th, 2009 04:26 pm (UTC)
Re: timestamp
If your system has a fine-grained time source that is synchronized across the system, then you can indeed obtain unique timestamps by concatenating time timestamp with a CPU/thread ID. As you say, this can be quite useful in a number of situations.

However, it does not have the property “that the IDs be assigned consecutively.” In addition, in the sadly common case where the hardware does not provide a synchronized fine-grained timestamp, computing one in software can incur significant overhead.

And if your algorithm can tolerate unsynchronized time, then it can probably also tolerate some of the linked-to solutions.

Nevertheless, you are correct in saying that timestamps can be useful in a number of situations.