You are viewing paulmck

Previous Entry | Next Entry

Tux
Reader-writer locks can certainly cause problems if used carelessly, the problems including cache thrashing, write-side contention, and poor scaling. But it is a rare tool that does not have some jobs that it is good for, so it is reasonable to ask what types of jobs favor reader-writer locking.

First, it is important to note that if the reader-writer lock is used sufficiently infrequently, its overhead cannot possibly cause too much trouble. On most systems, if the critical sections are short and if the lock is not acquired more than a few tens of thousands of times per second, there should be no problem. On the other hand, if the lock must be acquired millions of times per second, you are likely to need to do something else. This “something else” might be partitioning the data protected by the reader-writer lock (thus reducing the per-lock acquisition frequency), or it might be some other tool. (Yes, yes, I might be expected to suggest RCU, and I might well do so, but reference counting, exclusive locks, and atomic operations are also possibilities.)

If the reader-writer lock is primarily read-acquired, and if it is read-held for a significant period of time, then reader-writer locking can again work quite well. However, the number of acquisitions per unit time is still important, and this number often rises with increasing numbers of CPUs. This effect can be seen in the following graph:



This graph was generated on a 64-core Power 5 system with two hardware threads per core for 128 hardware threads total. Each thread repeatedly:

  1. read-acquired a pthread_rwlock_t,
  2. took a fixed number of passes through a tight loop, and
  3. released the rwlock_t.

In the “1K” trace, this fixed number of passes was 1,000, in the “10K” it was 10,000, and so on up to the “100M” trace, where the fixed number of passes was 100 million. Each point is computed by taking the ratio of the acquisitions per unit time for N threads divided by N times the acquisitions per unit time for a single thread. Ideal scaling would give 1.0.

As you can see, adding CPUs eventually causes performance to suffer, and the shorter the critical section, the greater the degree of suffering inflicted.

So, reader-writer locking can be the right tool for read-mostly jobs, at least when acquired sufficiently infrequently or if the read-side critical sections are sufficiently long. Just keep in mind that the value of “sufficiently” varies with the number of CPUs!