You are viewing paulmck

Previous Entry | Next Entry

TMEverywhere
A comparison of transactional memory (TM) and locking by Maged Michael, Josh Triplett, Jonathan Walpole, and myself recently appeared in ACM Operating System Review. This is a journal version of an earlier workshop paper in PLOS 2007.

This paper attempts to compare locking and TM in an even-handed manner, presenting the advantages and disadvantages of each, and showing which situations each is best suited for. And no, it is not the case that TM is uniformly better than locking — or vice versa! That said, locking does have the advantage of being available on current commodity computer systems. And yes, I was the locking advocate among the authors of this paper. Why do you ask? ;–)

Comments

( 21 comments )
dvyukov [google.com]
Sep. 29th, 2010 06:01 pm (UTC)
Yeah, and both will be badly kicked by RCU :)
paulmck
Sep. 29th, 2010 06:25 pm (UTC)
RCU
In the situations RCU handles, it is indeed hard to beat. But I am glad that you said it first. ;&ndash)
dvyukov [google.com]
Sep. 29th, 2010 06:05 pm (UTC)
> Locking makes use of expensive instructions and results in
expensive cache misses [12]. This is particularly damaging
for read-mostly workloads, where locking introduces commu-
nications cache misses into a workload that could otherwise
run entirely within the CPU cache. This can result in severe
performance degradation even in the absence of contention.

Humm... and about seqlocks, distributed locks, biased locks? Not saying about asymmetric distributed locks.
In the end, is any TM out there able to outweigh ~40 cycles for uncontended atomic RMW? I doubt. At least no one that I saw.
paulmck
Sep. 29th, 2010 06:39 pm (UTC)
seqlocks, distributed locks, biased locks, and asymmetric distributed locks
Good point, we should have stated explicitly that we were considering exclusive locks.

I don't think of seqlocks as really being locks, but rather a high-performance but restricted use of software transactional memory. They nevertheless can provide good read-side performance &mdash if the critical-section length is short and updates are infrequent. You are right that biased locks and asymmetric distributed locks allow readers to run in the cache. In fact, it is possible to construct a reader-writer lock such that readers execute neither atomic instructions nor memory barriers in the absence of writers. ;&ndash) However, write-side accesses can be quite painful in both cases.

As near as I can tell, STM is still struggling to come close to the performance of uncontended locking. However, if you are willing to live within the hardware limits, HTM can be quite fast, possibly rivaling an uncontended atomic RMW in some cases. That said, the hardware limits can be profound, and the inability to single-step through transactions can be a bit inconvenient!
dvyukov [google.com]
Sep. 29th, 2010 07:39 pm (UTC)
Re: seqlocks, distributed locks, biased locks, and asymmetric distributed locks
> I don't think of seqlocks as really being locks, but rather a high-performance but restricted use of software transactional memory.

Agree. Just like the graph example I described in another comment. TM has some really good ideas in it. And there is no reason why we can't employ them w/o using full-fledged TM. And we actually do so. And for some substantial amount of time already. Smart hybrid crafted algorithms always wins... performance-wise, not simplicity-wise.
paulmck
Sep. 30th, 2010 12:07 am (UTC)
Re: seqlocks, distributed locks, biased locks, and asymmetric distributed locks
Fair enough!!!
dvyukov [google.com]
Sep. 29th, 2010 06:43 pm (UTC)
Graph updates
> An important TM near-term opportunity is thus update-heavy workloads using large non-partitionable data structures such as high-diameter unstructured graphs.

Two variants are possible:
1. A thread already has pointers to nodes that he must update (and that nodes can't be deleted under it's feet). Then lock affected nodes with backoff, and do whatever you want. It's actually a kind of hand coded micro TM but with right granularity and lower overheads. I think that sort&lock won't do because of potentially removing nodes that linked to our set of nodes (in order to determine which nodes we need to update as well, we first need to lock our set of nodes, and that other will do the same thing, so *potential* deadlocks are unavoidable if we need to remove nodes during operations).
TM will do the same thing but with higher overheads.

2. A thread need to traverse a graph first to determine which nodes to update. In this situation locks must be paired with partial-copy-on-write updated and safe-memory-reclamation. Retries are unavoidable as well, because we first traverse optimistically, and then lock and recheck our assumptions.
And TM will be squashed by readset size, conflicts and retries.

What I am missing?



paulmck
Sep. 29th, 2010 07:24 pm (UTC)
Re: Graph updates
I agree that your #1 should work, and that it is in effect a hand-coded transaction. As such, TM advocates would argue that TM gets the same effect more simply, and with “minimal performance degradation” for STM. They would no doubt also point out the likelihood of higher performance for an HTM that accommodated sufficiently large transactions (that of Azul Systems might well qualify).

Your #2 is a reasonable approach as well, and would work using either hazard pointers or RCU. As you might guess, my preference would be to use RCU. ;&ndash)
dvyukov [google.com]
Sep. 29th, 2010 08:02 pm (UTC)
Re: Graph updates
I have nothing against HTM, it's a great thing, and I am waiting with impatience when I get it on my desk for biotomy :)

You know, I concluded that there are actually two different things hiding behind term TM, and people usually do not differentiate them explicitly, and this confuses me. First is a programming model, i.e. surround arbitrary piece of code with atomic{} and you are done. Second is an extended synchronization primitive (a kind of extended CAS). This is completely different things, intended for different usage scenarios, with different characteristics, and intended to use by completely different people.
As for programming model, I think that even Azul's HTM with software backoff and smart heuristics is still far from ready to prime time. Cliff mentioned that they use TM only for lock elision, so most of their transaction are short, as opposed to "TM as programming model" which encourages larger atomic sections.
And as for "TM as extended synchronization primitive" (inevitably implemented completely in hardware w/o sw backoff). It's a great thing without a doubt and basically w/o any drawbacks. I just need to wait for several years when Intel and AMD will implement it... and then for another 10 years when worldwide hardware park will completely renew...
paulmck
Sep. 30th, 2010 12:08 am (UTC)
Re: Graph updates
From what I can see, the reason many people do not differentiate between these two TM-related concepts is that they fervently wish that they were one and the same. ;–)
mkevac
Sep. 29th, 2010 07:09 pm (UTC)
How can non ACM member read this PDF?
paulmck
Sep. 29th, 2010 07:15 pm (UTC)
Non-ACM-member access to PDF
I suggest one of the following two methods:

  1. Join ACM. (Highly recommended.)
  2. Do a web search with the following string: "mckenney walpole triplett michael grass".

Take your choice! Please let me know if neither of these two methods work for you.

PS: Your image reminds me more of the Guy Fawkes mask in “Vendetta” than a Russian knight, but I guess appearances can be deceiving. ;&mdash)
mkevac
Sep. 29th, 2010 07:26 pm (UTC)
Re: Non-ACM-member access to PDF
Thank you. Well, it's avatar, it doesn't have to resemble me or my nickname :-) Great movie, btw.
dvyukov [google.com]
Sep. 29th, 2010 07:32 pm (UTC)
Re: Non-ACM-member access to PDF
I've just typed "Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory" in Google.
paulmck
Sep. 29th, 2010 07:43 pm (UTC)
Re: Non-ACM-member access to PDF
Glad it worked for you!
paulmck
Sep. 29th, 2010 07:42 pm (UTC)
Re: Non-ACM-member access to PDF
I particularly liked the music in Vendetta, and I must confess that the lead character's initial monologue featuring words beginning with the letter “v” was quite impressive!
(Anonymous)
Sep. 30th, 2010 03:13 am (UTC)
Re: Non-ACM-member access to PDF
> Join ACM. (Highly recommended.)

Seconded, but unfortunately just joining the ACM doesn't give you access to papers in the digital library. You have to pay (quite a bit) extra for that.
paulmck
Sep. 30th, 2010 05:51 pm (UTC)
Re: Non-ACM-member access to PDF
Good point! For me, access to ACM's and IEEE's digital libraries is well worth the cost, but your mileage may vary. If digital-library access is not in your near future, then method #2 might be the right one for you.
(Anonymous)
Oct. 1st, 2010 05:01 am (UTC)
Re: Non-ACM-member access to PDF
And, of course, if you work as a professional or academic researcher, your institution may pay for your subscription to such services, or for a site-wide subscription if they have enough users.

Still obnoxious that they sell our own papers back to us and our peers.
(Anonymous)
Feb. 15th, 2011 09:57 pm (UTC)
This was a nice article to read, thank you for sharing it.
paulmck
Feb. 16th, 2011 12:19 am (UTC)
Thank you, glad you liked it!
( 21 comments )