RCU

Stupid RCU Tricks: Torturing RCU Fundamentally, Parts IV and V

Continuing further into the Linux-kernel Documentation/RCU/Design/Requirements/Requirements.rst file uncovers RCU's final two fundamental guarantees:

 

  1. The common-case RCU primitives are unconditional, and
  2. RCU users can perform a guaranteed read-to-write upgrade.

The first guarantee is trivially verified by inspection of the RCU API. The type of rcu_read_lock(), rcu_read_unlock(), synchronize_rcu(), call_rcu(), and rcu_assign_pointer() are all void. These API members therefore have no way to indicate failure. Even primitives like rcu_dereference(), which do have non-void return types, will succeed any time a load of their pointer argument would succeed. That is, if you do rcu_dereference(*foop), where foop is a NULL pointer, then yes, you will get a segmentation fault. But this segmentation fault will be unconditional, as advertised!

The second guarantee is a consequence of the first four guarantees, and must be tested not within RCU itself, but rather within the code using RCU to carry out the read-to-write upgrade.

Thus for these last two fundamental guarantees there is no code in rcutorture. So perhaps even rcutorture deserves a break from time to time! ;–)
RCU

Stupid RCU Tricks: Torturing RCU Fundamentally, Part III

Even more reading of the Linux-kernel Documentation/RCU/Design/Requirements/Requirements.rst file encounters RCU's memory-barrier guarantees. These guarantees are a bit ornate, but roughly speaking guarantee that RCU read-side critical sections lapping over one end of a given grace period are fully ordered with anything past the other end of that same grace period. RCU's overall approach towards this guarantee is shown in the Linux-kernel Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst file, so one approach would be to argue that these guarantees are proven by a combination of this documentation along with periodic code inspection. Although this approach works well for some properties, the periodic code inspections require great attention to detail spanning a large quantity of intricate code. As such, these inspections are all too vulnerable to human error.

Another approach is formal verification, and in fact RCU's guarantees have been formally verified. Unfortunately, these formal-verification efforts, groundbreaking though they are, must be considered to be one-off tours de force. In contrast, RCU needs regular regression testing.

This leaves rcutorture, which has the advantage of being tireless and reasonably thorough, especially when compared to human beings. Except that rcutorture does not currently test RCU's memory-barrier guarantees.

Or at least it did not until today.

A new commit on the -rcu tree enlists the existing RCU readers. Each reader frequently increments a free-running counter, which can then be used to check memory ordering: If the counter appears to have counted backwards, something is broken. Each reader samples and records a randomly selected reader's counter, and assigns some other randomly selected reader to check for backwardsness. A flag is set at the end of each grace period, and once this flag is set, that other reader takes another sample of that same counter and compares them.

Of course, the reality is a bit more involved, and probably will become even more involved as review and testing proceeds. But in the meantime, the interested reader can find the initial state of this rcutorture enhancement here.

The test strategy for this particular fundamental property of RCU is more complex and likely less effective than the memory-ordering property described earlier, but life is like that sometimes.
RCU

Stupid RCU Tricks: Torturing RCU Fundamentally, Part II

Further reading of the Linux-kernel Documentation/RCU/Design/Requirements/Requirements.rst file encounters RCU's publish/subscribe guarantee. This guarantee ensures that RCU readers that traverse a newly inserted element of an RCU-protected data structure never see pre-initialization garbage in that element. In CONFIG_PREEMPT_NONE=y kernels, this guarantee combined with the grace-period guarantee permits RCU readers to traverse RCU-protected data structures using exactly the same sequence of instructions that would be used if these data structures were immutable. As always, free is a very good price!

However, some care is required to make use of this publish-subscribe guarantee. When inserting a new element, updaters must take care to first initialize everything that RCU readers might access and only then use an RCU primitive to carry out the insertion. Such primitives include rcu_assign_pointer() and list_add_rcu(), but please see The RCU API, 2019 edition or the Linux-kernel source code for the full list.

For their part, readers must use an RCU primitive to carry out their traversals, for example, rcu_dereference() or list_for_each_entry_rcu(). Again, please see The RCU API, 2019 edition or the Linux-kernel source code for the full list of such primitives.

Of course, rcutorture needs to test this publish/subscribe guarantee. It does this using yet another field in the rcu_torture structure:

struct rcu_torture {
  struct rcu_head rtort_rcu;
  int rtort_pipe_count;
  struct list_head rtort_free;
  int rtort_mbtest;
};

This additional field is ->rtort_mbtest, which is set to zero when a given rcu_torture structure is freed for reuse (see the rcu_torture_pipe_update_one() function), and then set to 1 just before that structure is made available to readers (see the rcu_torture_writer() function). For its part, the rcu_torture_one_read() function checks to see if this field is zero, and if so flags the error by atomically incrementing the global n_rcu_torture_mberror counter. As you would expect, any run ending with a non-zero value in this counter is considered to be a failure.

Thus we have an important fundamental property of RCU that nevertheless happens to have a simple but effective test strategy. To the best of my knowledge, this was also the first aspect of Linux-kernel RCU that was subjected to an automated proof of correctness.

Sometimes you get lucky! ;–)
RCU

Stupid RCU Tricks: Torturing RCU Fundamentally, Part I

A quick look at the beginning of the Documentation/RCU/Design/Requirements/Requirements.rst file in a recent Linux-kernel source tree might suggest that testing RCU's fundamental requirements is Job One. And that suggestion would be quite correct. This post describes how rcutorture tests RCU's grace-period guarantee, which is usually used to make sure that data is not freed out from under an RCU reader. Later posts will describe how the other fundamental guarantees are tested.

What Exactly is RCU's Fundamental Grace-Period Guarantee?

Any RCU reader that started before the start of a given grace period is guaranteed to complete before that grace period completes. This is shown in the following diagram:

Diagram of RCU grace-period guarantee 1

Similarly, any RCU reader that completes after the end of a given grace period is guaranteed to have started after that grace period started. And this is shown in this diagram:

Diagram of RCU grace-period guarantee 2

More information is available in the aforementioned Documentation/RCU/Design/Requirements/Requirements.rst file.

Whose Fault is This rcutorture Failure, Anyway?

Suppose an rcutorture test fails, perhaps by triggering a WARN_ON() that normally indicates a problem in some other area of the kernel. But how do we know this failure is not instead RCU's fault?

One straightforward way to test RCU's grace-period guarantee would be to maintain a single RCU-protected pointer (let's call it rcu_torture_current) to a single structure, perhaps defined as follows:

struct rcu_torture {
  struct rcu_head rtort_rcu;
  atomic_t rtort_nreaders;
  int rtort_pipe_count;
} *rcu_torture_current;

Readers could then do something like this in a loop:

rcu_read_lock();
p = rcu_dereference(rcu_torture_current);
atomic_inc(&p->rtort_nreaders);
burn_a_random_amount_of_cpu_time();
WARN_ON(READ_ONCE(p->rtort_pipe_count));
rcu_read_unlock();

An updater could do something like this, also in a loop:

p = kzalloc(sizeof(*p), GFP_KERNEL);
q = xchg(&rcu_torture_current, p);
call_rcu(&q->rtort_rcu, rcu_torture_cb);

And the rcu_torture_cb() function might be defined as follows:

static void rcu_torture_cb(struct rcu_head *p)
{
  struct rcu_torture *rp = container_of(p, struct rcu_torture, rtort_rcu);

  accumulate_stats(atomic_read(&rp->rtort_nreaders));
  WRITE_ONCE(rp->rtort_pipe_count, 1);
  burn_a_bit_more_cpu_time();
  kfree(rp);
}

This approach is of course problematic, never mind that one of rcutorture's predecessors actually did something like this. For one thing, a reader might be interrupted or (in CONFIG_PREEMPT=y kernels) preempted between its rcu_dereference() and its atomic_inc(). Then a too-short RCU grace period could result in the above reader doing its atomic_inc() on some structure that had already been freed and allocated as some other data structure used by some other part of the kernel. This could in turn result in a confusing failure in that other part of the kernel that was really RCU's fault.

In addition, the read-side atomic_inc() will result in expensive cache misses that will end up synchronizing multiple tasks concurrently executing the RCU reader code shown above. This synchronization will reduce read-side concurrency, which will in turn likely reduce the probability of these readers detecting a too-short grace period.

Finally, using the passage of time for synchronization is almost always a bad idea, so burn_a_bit_more_cpu_time() really needs to go. One might suspect that burn_a_random_amount_of_cpu_time() is also a bad idea, but we will see the wisdom in it.

Making rcutorture Preferentially Break RCU

The rcutorture module reduces the probability of false-positive non-RCU failures using these straightforward techniques:

  1. Allocate the memory to be referenced by rcu_torture_current in an array, whose elements are only ever used by rcutorture.
  2. Once an element is removed from rcu_torture_current, keep it in a special rcu_torture_removed list for some time before allowing it to be reused.
  3. Keep the random time delays in the rcutorture readers.
  4. Run rcutorture on an otherwise idle system, or, more commonly these days, within an otherwise idle guest OS.
  5. Make rcutorture place a relatively heavy load on RCU.

Use of the array keeps rcutorture from use-after-free clobbering of other kernel subsystems' data structures, keeping to-be-freed elements on the rcu_torture_removed list increases the probability that rcutorture will detect a too-short grace period, the delays in the readers increases the probability that a too-short grace period will be detected, and ensuring that most of the RCU activity is done at rcutorture's behest decreases the probability that any too-short grace periods will clobber other kernel subsystems.

The rcu_torture_alloc() and rcu_torture_free() functions manage a freelist of array elements. The freelist is a simple list creatively named rcu_torture_freelist and guarded by a global rcu_torture_lock. Because allocation and freeing happen at most once per grace period, this global lock is just fine: It is nowhere near being any sort of performance or scalability bottleneck.

The rcu_torture_removed list is handled by the rcu_torture_pipe_update_one() function that is invoked by rcutorture callbacks and the rcu_torture_pipe_update() function that is invoked by rcu_torture_writer() after completing a synchronous RCU grace period. The rcu_torture_pipe_update_one() function updates only the specified array element, and the rcu_torture_pipe_update() function updates all of the array elements residing on the rcu_torture_removed list. These updates each increment the ->rtort_pipe_count field. When the value of this field reaches RCU_TORTURE_PIPE_LEN (by default 10), the array element is freed for reuse.

The rcu_torture_reader() function handles the random time delays and leverages the awesome power of multiple kthreads to maintain a high read-side load on RCU. The rcu_torture_writer() function runs in a single kthread in order to simplify synchronization, but it enlists the help of several other kthreads repeatedly invoking the rcu_torture_fakewriter() in order to keep the update-side load on RCU at a respectable level.

 

This blog post described RCU's fundamental grace-period guarantee and how rcutorture stress-tests it. It also described a few simple ways that rcutorture increases the probability that any failures to provide this guarantee are attributed to RCU and not to some hapless innocent bystander.
inside

The Old Man and His Smartphone, 2020 “See You in September” Episode

The continued COVID-19 situation continues to render my smartphone's location services less than useful, though a number of applications will still beg me to enable it, preferring to know my present location rather than consider my past habits. One in particular does have a “Don't ask me again” link, but it asks each time anyway. Given that I have only ever used one of that business's locations, you would think that it would not be all that hard to figure out which location I was going to be using next. But perhaps I am the only one who habitually disables location services.

Using the smartphone for breakfast-time Internet browsing has avoided almost all flat-battery incidents. One recent exception occurred while preparing for a hike. But I still have my old digital camera, so I plugged the smartphone into its charger and took my digital camera instead. I have previously commented on the excellent quality of my smartphone's cameras, but there is nothing quite like going back to the old digital camera (never mind my long-departed 35mm SLR) to drive that lesson firmly home.

I was recently asked to text a photo, and saw no obvious way to do this. There was some urgency, so I asked for an email address and emailed the photo instead. This did get the job done, but let's just say that it appears that asking for an email address is no longer a sign of youth, vigor, or with-it-ness. Thus chastened, I experimented in a calmer time, learning that the trick is to touch the greater-than icon to the left of the text-message-entry bar, which produces an option to select from your gallery and also to include a newly taken picture.

The appearance of Comet Neowise showcased my smartphone's ability to orient and to display the relevant star charts. Nevertheless, my wife expressed confidence in this approach only after seeing the large number of cars parked in the same area that my smartphone and I had selected. I hadn't intended to take a photo of the comet because the professionals a much better job, especially those who are willing to travel far away from city lights and low altitudes. But here was my smartphone and there was the comet, so why not? The resulting photo was quite unsatisfactory, with so much pixelated noise that the comet was just barely discernible.

It was some days later that I found the smartphone's night mode. This is quite impressive. In this mode, the smartphone can form low-light images almost as well as my eyes can, which is saying something. It is also extremely good with point sources of light.

One recent trend in clothing is pockets for smartphones. This trend prompted my stepfather to suggest that the smartphone is the pocket watch of the 21st century. This might well be, but I still wear a wristwatch.

My refusal to use my smartphone's location services does not mean that location services cannot get me in trouble. Far from it! One memorable incident took place on BPA Road in Forest Park. A group of hikers asked me to verify their smartphone's chosen route, which would have taken them past the end of Firelane 13 and eventually down a small cliff. I advised them to choose a different route.

But I had seen the little line that their smartphone had drawn, and a week or so later found myself unable to resist checking it out. Sure enough, when I peered through the shrubbery marking the end of Firelane 13, I saw an unassuming but very distinct trail. Of course I followed it. Isn't that what trails are for? Besides, maybe someone had found a way around the cliff I knew to be at the bottom of that route.

To make a long story short, no one had found a way around that cliff. Instead, the trail went straight down it. For all but about eight feet of the trail, it was possible to work my way down via convenient handholds in the form of ferns, bushes, and trees. My plan for that eight feet was to let gravity do the work, and to regain control through use of a sapling at the bottom of that stretch of the so-called trail. Fortunately for me, that sapling was looking out for this old man, but unfortunately this looking out took the form of ensuring that I had a subcutaneous hold on its bark. Thankfully, the remainder of the traverse down the cliff was reasonably uneventful.

Important safety tip: If you absolutely must use that trail, wear a pair of leather work gloves!
RCU

Stupid RCU Tricks: Enlisting the Aid of a Debugger

Using Debuggers With rcutorture



So rcutorture found a bug, you have figured out how to reproduce it, git bisect was unhelpful (perhaps because the bug has been around forever), and the bug happens to be one of those rare RCU bugs for which a debugger might be helpful. What can you do?

What I have traditionally done is to get partway through figuring out how to make gdb work with rcutorture, then suddenly realize what the bug's root cause must be. At this point, I of course abandon gdb in favor of fixing the bug. As a result, although I have tried to apply gdb to the Linux kernel many times over the past 20 years, I never have actually succeeded in doing so. Now, this is not to say that gdb is useless to Linux-kernel hackers. Far from it! For one thing, the act of trying to use gdb has inspired me to perceive the root cause of a great many bugs, which means that it has served as a great productivity aid. For another thing, I frequently extract Linux-kernel code into a usermode scaffolding and use gdb in that context. And finally, there really are a number of Linux-kernel hackers who make regular use of gdb.

One of these hackers is Omar Sandoval, who happened to mention that he had used gdb to track down a Linux-kernel bug. And without first extracting the code to userspace. I figured that it was time for this old dog to learn a new trick, so I asked Omar how he made this happen.

Omar pointed out that because rcutorture runs in guest OSes, gdb can take advantage of the debugging support provided by qemu. To make this work, you build a kernel with CONFIG_DEBUG_INFO=y (which supplies gdb with additional symbols), provide the nokaslr kernel boot parameter (which prevents kernel address-space randomization from invalidating these symbols), and supply qemu with the -s -S command-line arguments (which causes it to wait for gdb to connect instead of immediately booting the kernel). You then specify the vmlinux file's pathname as the sole command-line argument to gdb. Once you see the (gdb) prompt, the target remote :1234 command will connect to qemu and then the continue command will boot the kernel.

I tried this, and it worked like a charm.

Alternatively, you can now use the shiny new rcutorture --gdb command-line argument in the -rcu tree, which will automatically set up the kernel and qemu, and will print out the required gdb commands, including the path to the newly built vmlinux file.

And yes, I do owe Omar a --drgn command-line argument, which I will supply once he lets me know how to connect drgn to qemu. :-)

In the meantime, the following sections cover a couple of uses I have made of --gdb, mostly to get practice with this approach to Linux-kernel debugging.

Case study 1: locktorture

For example, let's use gdb to investigate a long-standing locktorture hang when running scenario LOCK05:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock \
    --duration 3 --configs LOCK05 --gdb

This will print out the following once the kernel is built and qemu has started:

Waiting for you to attach a debug session, for example:
    gdb gdb /home/git/linux-rcu/tools/testing/selftests/rcutorture/res/2020.08.27-14.51.45/LOCK05/vmlinux
After symbols load and the "(gdb)" prompt appears:
    target remote :1234
    continue

Once you have started gdb and entered the two suggested commands, the kernel will start. You can track its console output by locating its console.log file as described in an earlier post. Or you can use the ps command to dump the qemu command line, looking for the -serial file: command, which is following by the pathname of the file receiving the console output.

Once the kernel is sufficiently hung, that is, more than 15 seconds elapses after the last statistics output line (Writes: Total: 27668769 Max/Min: 27403330/34661 Fail: 0), you can hit control-C at gdb. The usual info threads command will show the CPUs' states, here with the 64-bit hexadecimal addresses abbreviated:

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 1 (CPU#0 [running]) stutter_wait (title=0xf... "lock_torture_writer")
    at kernel/torture.c:615
  2    Thread 2 (CPU#1 [running]) 0xf... in stutter_wait (
    title=0xf... "lock_torture_writer") at kernel/torture.c:615
  3    Thread 3 (CPU#2 [halted ]) default_idle () at arch/x86/kernel/process.c:689
  4    Thread 4 (CPU#3 [halted ]) default_idle () at arch/x86/kernel/process.c:689

It is odd that CPUs 0 and 1 are in stutter_wait(), spinning on the global variable stutter_pause_test. Even more odd is that the value of this variable is not zero, as it should be at the end of the test, but rather the value two. After all, all paths out of torture_stutter() should zero this variable.

But maybe torture_stutter() is still stuck in the loop prior to the zeroing of stutter_pause_test. A quick look at torture_stutter_init shows us that the task_struct pointer to the task running torture_stutter lives in stutter_task, which is non-NULL, meaning that this task still lives. One might hope to use sched_show_task(), but this sadly fails with Could not fetch register "fs_base"; remote failure reply 'E14'.

The value of stutter_task.state is zero, which indicates that this task is running. But on what CPU? CPUs 0 and 1 are both spinning in stutter_wait, and the other two CPUs are in the idle loop. So let's look at stutter_task.on_cpu, which is zero, as in not on a CPU. In addition, stutter_task.cpu has the value one, and CPU 1 is definitely running some other task.

It would be good to just be able to print the stack of the blocked task, but it is also worth just rerunning this test, but this time with the locktorture.stutter module parameter set to zero. This test completed successfully, in particular, with no hangs. Given that no other locktorture or rcutorture scenario suffers from similar hangs, perhaps the problem is in rt_mutex_lock() itself. To check this, let's restart the test, but with the default value of the locktorture.stutter module parameter. After letting it hang and interrupting it with control-C, even though it still feels strange to control-C a kernel:

(gdb)  print torture_rtmutex
$1 = {wait_lock = {raw_lock = {{val = {counter = 0}, {locked = 0 '\000', pending = 0 '\000'}, {
          locked_pending = 0, tail = 0}}}}, waiters = {rb_root = {rb_node = 0xffffc9000025be50}, 
    rb_leftmost = 0xffffc90000263e50}, owner = 0x1 <fixed_percpu_data+1>}

The owner = 0x1 looks quite strange for a task_struct pointer, but the block comment preceding rt_mutex_set_owner() says that this value is legitimate, and represents one of two transitional states. So maybe it is time for CONFIG_DEBUG_RT_MUTEXES=y, but enabling this Kconfig option produces little additional enlightenment.

However, the torture_rtmutex.waiters field indicates that there really is something waiting on the lock. Of course, it might be that we just happened to catch the lock at this point in time. To check on this, let's add a variable to capture the time of the last lock release. I empirically determined that it is necessary to use WRITE_ONCE() to update this variable in order to prevent the compiler from optimizing it out of existence. Learn from my mistakes!

With the addition of WRITE_ONCE(), the next run showed that the last lock operation was more than three minutes in the past and that the transitional lock state still persisted, which provides strong evidence that this is the result of a race condition in the locking primitive itself. Except that a quick scan of the code didn't immediately identify a race condition. Furthermore, the failure happens even with CONFIG_DEBUG_RT_MUTEXES=y, which disables the lockless fastpaths (or the obvious lockless fastpaths, anyway).

Perhaps this is instead a lost wakeup? This would be fortuitous given that there are rare lost-IPI issues, and having this reproduce so easily on my laptop would be extremely convenient. And adding a bit of debug code to mark_wakeup_next_waiter() and lock_torture_writer() show that there is a task that was awakened, but that never exited from rt_mutex_lock(). And this task is runnable, that is, its ->state value is zero. But it is clearly not running very far! And further instrumentation demonstrates that control is not reaching the __smp_call_single_queue() call from __ttwu_queue_wakelist(). The chase is on!

Except that the problem ended up being in stutter_wait(). As the name suggests, this function controls stuttering, that is, periodically switching between full load and zero load. Such stuttering can expose bugs that a pure full-load stress test would miss.

The stutter_wait() uses adaptive waiting, so that schedule_timeout_interruptible() is used early in each no-load interval, but a tight loop containing cond_resched() is used near the end of the interval. The point of this is to more tightly synchronize the transition from no-load to full load. But the LOCK05 scenario's kernel is built with CONFIG_PREEMPT=y, which causes cond_resched() to be a no-op. In addition, the kthreads doing the write locking lower their priority using set_user_nice(current, MAX_NICE), which appears to be preventing preemption. (We can argue that even MAX_NICE should not indefinitely prevent preemption, but the multi-minute waits that have been observed are for all intents and purposes indefinite.)

The fix (or workaround, as the case might be) is for stutter_wait() to block periodically, thus allowing other tasks to run.

Case study 2: RCU Tasks Trace

I designed RCU Tasks Trace for the same grace-period latency that I had designed RCU Tasks for, namely roughly one second. Unfortunately, this proved to be about 40x too slow, so adjustments were called for.

After those reporting the issue kindly verified for me that this was not a case of too-long readers, I used --gdb to check statistics and state. I used rcuscale, which is a member of the rcutorture family designed to measure performance and scalability of the various RCU flavors' grace periods:

tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus \
    --configs TRACE01 --bootargs "rcuscale.nreaders=0 rcuscale.nwriters=10" \
    --trust-make --gdb

Once the (gdb) prompt appears, we connect to qemu, set a break point, and then continue execution:

(gdb) target remote :1234
Remote debugging using :1234
0x000000000000fff0 in exception_stacks ()
(gdb) b rcu_scale_cleanup
Breakpoint 1 at 0xffffffff810d27a0: file kernel/rcu/rcuscale.c, line 505.
(gdb) cont
Continuing.
Remote connection closed
(gdb)

Unfortunately, as shown above, this gets us Remote connection closed instead of a breakpoint. Apparently, the Linux kernel does not take kindly to debug exception instructions being inserted into its code. Fortunately, gdb also supplies a hardware breakpoint command:

(gdb) target remote :1234
Remote debugging using :1234
0x000000000000fff0 in exception_stacks ()
(gdb) hbreak rcu_scale_cleanup
Hardware assisted breakpoint 1 at 0xffffffff810d27a0: file kernel/rcu/rcuscale.c, line 505.
(gdb) cont
Continuing.
[Switching to Thread 12]

Thread 12 hit Breakpoint 1, rcu_scale_cleanup () at kernel/rcu/rcuscale.c:505
505     {

This works much better, and the various data structures may now be inspected to check the validity of various optimization approaches. Of course, as the optimization effort continued, hand-typing gdb commands became onerous, and was therefore replaced with crude but automatic accumulation and display of relevant statistics.

Of course, Murphy being who he is, the eventual grace-period speedup also caused a few heretofore latent race conditions to be triggered by a few tens of hours of rctorture. These race conditions resulted in rcu_torture_writer() stalls, along with the occasional full-fledged RCU-Tasks-Trace CPU stall warning.

Now, rcutorture does dump out RCU grace-period kthread state when these events occur, but in the case of the rcu_torture_writer() stalls, this state is for vanilla RCU rather than the flavor of RCU under test. Which is an rcutorture bug that will be fixed. But in the meantime, gdb provides a quick workaround by setting a hardware breakpoint on the ftrace_dump() function, which is called when either of these sorts of stalls occur. When the breakpoint triggers, it is easy to manually dump the data pertaining to the grace-period kthread of your choice.

For those who are curious, the race turned out to be an IPI arriving between a pair of stores in rcu_read_unlock_trace() that could leave the corresponding task forever blocking the current RCU Tasks Trace grace period. The solution, as with vanilla RCU in the v3.0 timeframe, is to set the read-side nesting value to a negative number while clearing the .need_qs field indicating that a quiescent state is required. The buggy code is as follows:

if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
    // BUG: IPI here sets .need_qs after check!!!
    WRITE_ONCE(t->trc_reader_nesting, nesting);
    return;  // We assume shallow reader nesting.
}

Again, the fix is to set the nesting count to a large negative number, which allows the IPI handler to detect this race and refrain from updating the .need_qs field when the ->trc_reader_nesting field is negative, thus avoiding the grace-period hang:

WRITE_ONCE(t->trc_reader_nesting, INT_MIN); // FIX
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
    WRITE_ONCE(t->trc_reader_nesting, nesting);
    return;  // We assume shallow reader nesting.
}

This experience of course suggests testing with grace period latencies tuned much more aggressively than they are in production, with an eye to finding additional low-probability race conditions.

Case study 3: x86 IPIs

Tracing the x86 IPI code path can be challenging because function pointers are heavily used. Unfortunately, some of these function pointers are initialized at runtime, so simply running gdb on the vmlinux binary does not suffice. However, we can again set a breakpoint somewhere in the run and check these pointers after initialization is complete:

tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration 5 --gdb --configs "NOPREEMPT" --bootargs "scftorture.stat_interval=15 scftorture.verbose=1"

We can then set a hardware-assisted breakpoint as shown above at any convenient runtime function.

Once this breakpoint is encountered:

(gdb) print smp_ops
$2 = {smp_prepare_boot_cpu = 0xffffffff82a13833 , 
  smp_prepare_cpus = 0xffffffff82a135f9 , 
  smp_cpus_done = 0xffffffff82a13897 , 
  stop_other_cpus = 0xffffffff81042c40 , 
  crash_stop_other_cpus = 0xffffffff8104d360 , 
  smp_send_reschedule = 0xffffffff81047220 , 
  cpu_up = 0xffffffff81044140 , 
  cpu_disable = 0xffffffff81044aa0 , 
  cpu_die = 0xffffffff81044b20 , 
  play_dead = 0xffffffff81044b80 , 
  send_call_func_ipi = 0xffffffff81047280 , 
  send_call_func_single_ipi = 0xffffffff81047260 }

This shows that smp_ops.send_call_func_single_ipi is native_send_call_func_single_ipi(), which helps to demystify arch_send_call_function_single_ipi(). Except that this native_send_call_func_single_ipi() function is just a wrapper around apic->send_IPI(cpu, CALL_FUNCTION_SINGLE_VECTOR). So:

(gdb) print *apic
$4 = {eoi_write = 0xffffffff8104b8c0 , 
  native_eoi_write = 0x0 , write = 0xffffffff8104b8c0 , 
  read = 0xffffffff8104b8d0 , 
  wait_icr_idle = 0xffffffff81046440 , 
  safe_wait_icr_idle = 0xffffffff81046460 , 
  send_IPI = 0xffffffff810473c0 , 
  send_IPI_mask = 0xffffffff810473f0 , 
  send_IPI_mask_allbutself = 0xffffffff81047450 , 
  send_IPI_allbutself = 0xffffffff81047500 , 
  send_IPI_all = 0xffffffff81047510 , 
  send_IPI_self = 0xffffffff81047520 , dest_logical = 0, disable_esr = 0, 
  irq_delivery_mode = 0, irq_dest_mode = 0, 
  calc_dest_apicid = 0xffffffff81046f90 , 
  icr_read = 0xffffffff810464f0 , 
  icr_write = 0xffffffff810464b0 , 
  probe = 0xffffffff8104bb00 , 
  acpi_madt_oem_check = 0xffffffff8104ba80 , 
  apic_id_valid = 0xffffffff81047010 , 
  apic_id_registered = 0xffffffff8104b9c0 , 
  check_apicid_used = 0x0 , 
  init_apic_ldr = 0xffffffff8104b9a0 , 
  ioapic_phys_id_map = 0x0 , setup_apic_routing = 0x0 ,
  cpu_present_to_apicid = 0xffffffff81046f50 ,
  apicid_to_cpu_present = 0x0 , 
  check_phys_apicid_present = 0xffffffff81046ff0 , 
  phys_pkg_id = 0xffffffff8104b980 , 
  get_apic_id = 0xffffffff8104b960 , 
  set_apic_id = 0xffffffff8104b970 , wakeup_secondary_cpu = 0x0 , 
  inquire_remote_apic = 0xffffffff8104b9b0 , 
  name = 0xffffffff821f0802 "physical flat"}

Thus, in this configuration the result is default_send_IPI_single_phys(cpu, CALL_FUNCTION_SINGLE_VECTOR). And this function invokes __default_send_IPI_dest_field() with interrupts disabled, which in turn, after some setup work, writes a command word that includes the desired IPI vector to location 0x300 offset by the APIC_BASE.

To be continued...
RCU

Stupid RCU Tricks: Failure Probability and CPU Count

So rcutorture found a bug, whether in RCU or elsewhere, and it is now time to reproduce that bug, whether to make good use of git bisect or to verify an alleged fix. One problem is that, rcutorture being what it is, that bug is likely a race condition and it likely takes longer than you would like to reproduce. Assuming that it reproduces at all.

How to make it reproduce faster? Or at all, as the case may be?

One approach is to tweak the Kconfig options and maybe even the code to make the failure more probable. Another is to find a “near miss” that is related to and more probable than the actual failure.

But given that we are trying to make a race condition happen more frequently, it is only natural to try tweaking the number of CPUs. After all, one would hope that increasing the number of CPUs would increase the probability of hitting the race condition. So the straightforward answer is to use all available CPUs.

But how to use them? Run a single rcutorture scenario covering all the CPUs, give or take the limitations imposed by qemu and KVM? Or run many instances of that same scenario, with each instance using a small fraction of the available CPUs?

As is so often the case, the answer is: “It depends!”

If the race condition happens randomly between any pair of CPUs, then bigger is better. To see this, consider the following old-school ASCII-art comparison:

+---------------------+
|        N * M        |
+---+---+---+-----+---+
| N | N | N | ... | N |
+---+---+---+-----+---+

If there are n CPUs that can participate in the race condition, then at any given time there are n(n-1)/2 possible races. The upper row has N*M CPUs, and thus N*M*(N*M-1)/2 possible races. The lower row has M sets of N CPUs, and thus M*N*(N-1)/2, which is almost a factor of M smaller. For this type of race condition, you should therefore run a small number of scenarios with each using as many CPUs as possible, and preferably only one scenario that uses all of the CPUs. For example, to make the TREE03 scenario run on 64 CPUs, edit the tools/testing/selftests/rcutorture/configs/rcu/TREE03 file so as to set CONFIG_NR_CPUS=64.

But there is no guarantee that the race condition will be such that all CPUs participate with equal probability. For example, suppose that the bug was due to a race between RCU's grace-period kthread (named either rcu_preempt or rcu_sched, depending on your Kconfig options) and its expedited grace period, which at any given time will be running on at most one workqueue kthread.

In this case, no matter how many CPUs were available to a given rcutorture scenario, at most two of them could be participating in this race. In this case, it is instead best to run as many two-CPU rcutorture scenarios as possible, give or take the memory footprint of that many guest OSes (one per rcutorture scenario). For example, to make 32 TREE03 scenarios run on 64 CPUs, edit the tools/testing/selftests/rcutorture/configs/rcu/TREE03 file so as to set CONFIG_NR_CPUS=2 and remember to pass either the --allcpus or the --cpus 64 argument to kvm.sh.

What happens in real life?

For a race condition that rcutorture uncovered during the v5.8 merge window, running one large rcutorture instance instead of 14 smaller ones (very) roughly doubled the probability of locating the race condition.

In other words, real life is completely capable of lying somewhere between the two theoretical extremes outlined above.
RCU

Stupid RCU Tricks: So rcutorture is Not Aggressive Enough For You?

So you read the previous post, but simply running rcutorture did not completely vent your frustration. What can you do?

One thing you can do is to tweak a number of rcutorture settings to adjust the manner and type of torture that your testing inflicts.

RCU CPU Stall Warnings

If you are not averse to a quick act of vandalism, then you might wish to induce an RCU CPU stall warning. The --bootargs argument can be used for this, for example as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make \
    --bootargs "rcutorture.stall_cpu=22 rcutorture.fwd_progress=0"

The rcutorture.stall_cpu=22 says to stall a CPU for 22 seconds, that is, one second longer than the default RCU CPU stall timeout in mainline. If you are instead using a distribution kernel, you might need to specify 61 seconds (as in “rcutorture.stall_cpu=61”) in order to allow for the typical 60-second RCU CPU stall timeout. The rcutorture.fwd_progress=0 has no effect except to suppress a warning message (with stack trace included free of charge) that questions the wisdom of running both RCU-callback forward-progress tests and RCU CPU stall tests at the same time. In fact, the code not only emits the warning message, it also automatically suppresses the forward-progress tests. If you prefer living dangerously and don't mind the occasional out-of-memory (OOM) lockup accompanying your RCU CPU stall warnings, feel free to edit kernel/rcu/rcutorture.c to remove this automatic suppression.

If you are running on a large system that takes more than ten seconds to boot, you might need to increase the RCU CPU stall holdoff interval. For example, adding rcutorture.stall_cpu_holdoff=120 to the --bootargs list would wait for two minutes before stalling a CPU instead of the default holdoff of 10 seconds. If simply spinning a CPU with preemption disabled does not fully vent your ire, you could undertake a more profound act of vandalism by adding rcutorture.stall_cpu_irqsoff=1 so as to cause interrupts to be disabled on the spinning CPU.

Some flavors of RCU such as SRCU permit general blocking within their read-side critical sections, and you can exercise this capability by adding rcutorture.stall_cpu_block=1 to the --bootargs list. Better yet, you can use this kernel-boot parameter to torture flavors of RCU that forbid blocking within read-side critical sections, which allows you to see they complain about such mistreatment.

The vanilla flavor of RCU has a grace-period kthread, and stalling this kthread is another good way to torture RCU. Simply add rcutorture.stall_gp_kthread=22 to the --bootargs list, which delays the grace-period kthread for 22 seconds. Doing this will normally elicit strident protests from mainline kernels.

Finally, you could starve rcutorture of CPU time by running a large number of them concurrently (each in its own Linux-kernel source tree), thereby overcommitting the CPUs.

But maybe you would prefer to deprive RCU of memory. If so, read on!

Running rcutorture Out of Memory

By default, each rcutorture guest OS is allotted 512MB of memory. But perhaps you would like to have it make do with only 128MB:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --trust-make --memory 128M

You could go further by making the RCU need-resched testing more aggressive,T for example, by increasing the duration of this testing from the default three-quarters of the RCU CPU stall timeout to (say) seven eighths:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --trust-make --memory 128M \
    --bootargs "rcutorture.fwd_progress_div=8"

More to the point, you might make the RCU callback-flooding tests more aggressive, for example by adjusting the values of the MAX_FWD_CB_JIFFIES, MIN_FWD_CB_LAUNDERS, or MIN_FWD_CBS_LAUNDERED macros and rebuilding the kernel. Alternatively, you could use kill -STOP on one of the vCPUs in the middle of an rcutorture run. Either way, if you break it, you buy it!

Or perhaps you would rather attempt to drown rcutorture in memory, perhaps forcing a full 16GB onto each guest OS:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --trust-make --memory 16G

Another productive torture method involves unusual combinations of Kconfig options, a topic take up by the next section.

Confused Kconfig Options

The Kconfig options for a given rcutorture scenario are specified by the corresponding file in the tools/testing/selftests/rcutorture/configs/rcu directory. For example, the Kconfig options for the infamous TREE03 scenario may be found in tools/testing/selftests/rcutorture/configs/rcu/TREE03.

But why not just use the --kconfig argument and be happy, as described previously?

One reason is that there are a few Kconfig options that the rcutorture scripting refers to early in the process, before the --kconfig parameter's additions have been processed, for example, changing CONFIG_NR_CPUS should be done in the file rather than via the --kconfig parameter. Another reason is to not need to keep supplying a --kconfig argument for each of many repeated rcutorture runs. But perhaps most important, if you want some scenarios to be built with one Kconfig option and others built with some other Kconfig option, modifying each scenario's file avoids the need for multiple rcutorture runs.

For example, you could edit the tools/testing/selftests/rcutorture/configs/rcu/TREE03 file to change the CONFIG_NR_CPUS=16 to instead read CONFIG_NR_CPUS=4, and then run the following on a 12-CPU system:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --trust-make --configs "3*TREE03"

This would run three concurrent copies of TREE03, but with each guest OS restricted to only 4 CPUs.

Finally, if a given Kconfig option applies to all rcutorture runs and you are tired of repeatedly entering --kconfig arguments, you can instead add that option to the tools/testing/selftests/rcutorture/configs/rcu/CFcommon file.

But sometimes Kconfig options just aren't enough. And that is why we have kernel boot parameters, the subject of the next section.

Boisterous Boot Parameters

We have supplied kernel boot parameters using the --bootargs parameter, but sometimes ordering considerations or sheer laziness motivate greater permanent. Either way, the scenario's .boot file may be brought to bear, for example, the TREE03 scenario's file is located here: tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot.

As of the v5.7 Linux kernel, this file contains the following:

rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2
threadirqs

For example, the probability of RCU's grace period processing overlapping with CPU-hotplug operations may be adjusted by decreasing the value of the rcutorture.onoff_interval from its default of 200 milliseconds or by adjusting the various grace-period delays specified by the rcutree.gp_preinit_delay, rcutree.gp_init_delay, and rcutree.gp_cleanup_delay parameters. In fact, chasing bugs involving races between RCU grace periods and CPU-hotplug operations often involves tuning these four parameters to maximize race probability, thus decreasing the required rcutorture run durations.

The possibilities for the .boot file contents are limited only by the extent of the Documentation/admin-guide/kernel-parameters.txt. And actually not even by that, given the all-to-real possibility of undocumented kernel boot parameters.

You can also create your own rcutorture scenarios by creating a new set of files in the tools/testing/selftests/rcutorture/configs/rcu directory. You can make them run by default (or in response to the CFLIST string to the --configs parameter) by adding its name to the tools/testing/selftests/rcutorture/configs/rcu/CFLIST file. For example, you could create a MYSCENARIO file containing Kconfig options and (optionally) a MYSCENARIO.boot file containing kernel boot parameters in the tools/testing/selftests/rcutorture/configs/rcu directory, and make them run by default by adding a line reading MYSCENARIO to the tools/testing/selftests/rcutorture/configs/rcu/CFLIST file.

Summary

This post discussed enhancing rcutorture through use of stall warnings, memory limitations, Kconfig options, and kernel boot parameters. The special case of adjusting CONFIG_NR_CPUS deserves more attention, and that is the topic of the next post.
RCU

Stupid RCU Tricks: So you want to torture RCU?

Let's face it, using synchronization primitives such as RCU can be frustrating. And it is only natural to wish to get back, somehow, at the source of such frustration. In short, it is quite understandable to want to torture RCU. (And other synchronization primitives as well, but you have to start somewhere!) Another benefit of torturing RCU is that doing so sometimes uncovers bugs in other parts of the kernel. You see, RCU is not always willing to suffer alone.

One long-standing RCU-torture approach is to use modprobe and rmmod to install and remove the rcutorture module, as described in the torture-test documentation. However, this approach requires considerable manual work to check for errors.

On the other hand, this approach avoids any concern about the underlying architecture or virtualization technology. This means that use of modprobe and rmmod is the method of choice if you wish to torture RCU on (say) SPARC or when running on Hyper-V (this last according to people actually doing this). This method is also necessary when you want to torture RCU on a very specific kernel configuration or when you need to torture RCU on bare metal.

But for those of us running mainline kernels on x86 systems supporting virtualization, the approach described in the remainder of this document will usually be more convenient.

Running rcutorture in a Guest OS

If you have an x86 system (or, with luck, an ARMv8 or PowerPC system) set up to run qemu and KVM, you can instead use the rcutorture scripting, which automates running rcutorture over a full set of configurations, as well as automating analysis of the build products and console output. Running this can be as simple as:

tools/testing/selftests/rcutorture/bin/kvm.sh

As of v5.8-rc1, this will build and run each of nineteen combinations of Kconfig options, with each run taking 30 minutes for a total of 8.5 hours, not including the time required to build the kernel, boot the guest OS, and analyze the test results. Given that a number of the scenarios use only a single CPU, this approach can be quite wasteful, especially on the well-endowed systems of the year 2020.

This waste can be avoided by using the --cpus argument, for example, for the 12-hardware-thread laptop on which I am typing this, you could do the following:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12

This command would run up to 12 CPUs worth of rcutorture scenarios concurrently, so that the nineteen combinations would be run in eight batches. Because TREE03 and TREE07 each want 16 CPUs, rcutorture will complain in its run summary as follows:

 --- Mon Jun 15 10:23:02 PDT 2020 Test summary:
Results directory: /home/git/linux/tools/testing/selftests/rcutorture/res/2020.06.15-10.23.02
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 5 --trust-make
RUDE01 ------- 2102 GPs (7.00667/s) [tasks-rude: g0 f0x0 ]
SRCU-N ------- 42229 GPs (140.763/s) [srcu: g549860 f0x0 ]
SRCU-P ------- 11887 GPs (39.6233/s) [srcud: g110444 f0x0 ]
SRCU-t ------- 59641 GPs (198.803/s) [srcu: g1 f0x0 ]
SRCU-u ------- 59209 GPs (197.363/s) [srcud: g1 f0x0 ]
TASKS01 ------- 1029 GPs (3.43/s) [tasks: g0 f0x0 ]
TASKS02 ------- 1043 GPs (3.47667/s) [tasks: g0 f0x0 ]
TASKS03 ------- 1019 GPs (3.39667/s) [tasks: g0 f0x0 ]
TINY01 ------- 43373 GPs (144.577/s) [rcu: g0 f0x0 ] n_max_cbs: 34463
TINY02 ------- 46519 GPs (155.063/s) [rcu: g0 f0x0 ] n_max_cbs: 2197
TRACE01 ------- 756 GPs (2.52/s) [tasks-tracing: g0 f0x0 ]
TRACE02 ------- 559 GPs (1.86333/s) [tasks-tracing: g0 f0x0 ]
TREE01 ------- 8930 GPs (29.7667/s) [rcu: g64765 f0x0 ]
TREE02 ------- 17514 GPs (58.38/s) [rcu: g138645 f0x0 ] n_max_cbs: 18010
TREE03 ------- 15920 GPs (53.0667/s) [rcu: g159973 f0x0 ] n_max_cbs: 1025308
CPU count limited from 16 to 12
TREE04 ------- 10821 GPs (36.07/s) [rcu: g70293 f0x0 ] n_max_cbs: 81293
TREE05 ------- 16942 GPs (56.4733/s) [rcu: g123745 f0x0 ] n_max_cbs: 99796
TREE07 ------- 8248 GPs (27.4933/s) [rcu: g52933 f0x0 ] n_max_cbs: 183589
CPU count limited from 16 to 12
TREE09 ------- 39903 GPs (133.01/s) [rcu: g717745 f0x0 ] n_max_cbs: 83002

However, other than these two complaints, this is what the summary of an uneventful rcutorture run looks like.

Whatever is the meaning of all those numbers in the summary???

The console output for each run and much else besides may be found in the /home/git/linux/tools/testing/selftests/rcutorture/res/2020.06.15-10.23.02 directory called out above.

The more CPUs you have, the fewer batches are required:

CPUsBatches
119
216
413
810
166
323
642
1281


If you specify more CPUs than your system actually has, kvm.sh will ignore your fantasies in favor of your system's reality.

Specifying Specific Scenarios

Sometimes it is useful to take one's ire out on a specific type of RCU, for example, SRCU. You can use the --configs argument to select specific scenarios:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 \
    --configs "SRCU-N SRCU-P SRCU-t SRCU-u"

This runs in two batches, but the second batch uses only two CPUs, which is again wasteful. Given that SRCU-P requires eight CPUs, SRCU-N four CPUs, and SRCU-t and SRCU-u one each, it would cost nothing to run two instances of each of these scenarios other than SRCU-N as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 \
    --configs "SRCU-N 2*SRCU-P 2*SRCU-t 2*SRCU-u"

This same notation can be used to run multiple copies of the entire list of scenarios. For example (again, in v5.7), a system with 384 CPUs can use --configs 4*CFLIST to run four copies of of the full set of scenarios as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 384 --configs "4*CFLIST"

Mixing and matching is permissible, for example:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 384 --configs "3*CFLIST 12*TREE02"

A kvm.sh script that is to run on a wide variety of systems can benefit from --allcpus (expected to appear in v5.9), which acts like --cpus N, where N is the number of CPUs on the current system:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs "3*CFLIST 12*TREE02"

Build time can dominate when running a large number of short-duration runs, for example, when chasing down a low-probability non-deterministic boot-time failure. Use of --trust-make can be very helpful in this case:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 384 --duration 2 \
    --configs "1000*TINY01" --trust-make

Without --trust-make, rcutorture will play it safe by forcing your source tree to a known state between each build. In addition to --trust-make, there are a number of tools such as ccache that can also greatly reduce build times.

Locating Test Failures

Although the ability to automatically run many tens of scenarios can be very convenient, it can also cause significant eyestrain staring through a long “summary” checking for test failures. Therefore, if there are failures, this is noted at the end of the summary, for example, as shown in the following abbreviated output from a --configs "28*TREE03" run:

TREE03.8 ------- 1195094 GPs (55.3284/s) [rcu: g11475633 f0x0 ] n_max_cbs: 1449125
TREE03.9 ------- 1202936 GPs (55.6915/s) [rcu: g11572377 f0x0 ] n_max_cbs: 1514561
3 runs with runtime errors.

Of course, picking the three errors out of the 28 runs can also cause eyestrain, so there is yet another useful little script:

tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
    /home/git/linux/tools/testing/selftests/rcutorture/res/2020.06.15-10.23.02

This will run your editor on the make output for each build error and on the console output for each runtime failure, greatly reducing eyestrain. Users of vi can also edit a summary of the runtime errors from each failing run as follows:

vi /home/git/linux/tools/testing/selftests/rcutorture/res/2020.06.15-10.23.02/*/console.log.diags

Enlisting Torture Assistance

If rcutorture produces a failure-free run, that is a failure on the part of rcutorture. After all, there are bugs in there somewhere, and rcutorture failed to find them!

One approach is to increase the duration, for example, to 12 hours (also known as 720 minutes):

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 720

Another approach is to enlist the help of other in-kernel torture features, for example, lockdep. The --kconfig parameter to kvm.sh can be used to this end:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --configs "TREE03" \
    --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"

The aid of the kernel address sanitizer (KASAN) can be enlisted using the --kasan argument:

tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --kasan

The kernel concurrency sanitizer (KCSAN) can also be brought to bear, but proper use of KCSAN requires some thought (see part 1 and part 2 of the LWN “Concurrency bugs should fear the big bad data-race detector” article) and also version 11 or later of Clang/LLVM (and a patch for GCC has been accepted). Once you have all of that in place, the --kcsan argument invokes KCSAN and also generates a summary as described in part 1 of the aforementioned LWN article. Note again that only very recent compiler versions (such as Clang-11) support KCSAN, so a --kmake "CC=clang-11" or similar argument might also be necessary.

Selective Torturing

Sometimes enlisting debugging aid is the best approach, but other times greater selectivity is the best way forward.

Sometimes simply building a kernel is torture enough, especially when building with unusual Kconfig options (see the discussion of --kconfig above). In this case, specifying the --buildonly argument will build the kernels, but refrain from running them. This approach can also be useful for running multiple copies of the resulting binaries on multiple systems: You can use the --buildonly to build the kernels and qemu-cmd scripts, and then run these files on the other systems, given suitable adjustments to the qemu-cmd scripts.

Other times it is useful to torture some specific portion of RCU. For example, one wishing to vent their ire solely on expedited grace periods could add --bootargs "rcutorture.gp_exp=1" to the kvm.sh command line. This argument causes rcutorture to run a stress test using only expedited RCU grace periods, which can be helpful when attempting to work out whether a too-short RCU grace period is due to a bug in the normal or the expedited grace-period code. Similarly, the callback-flooding aspects of rcutorture stress testing can be disabled using --bootargs "rcutorture.fwd_progress=0". It is possible to specify both in one run using --bootargs "rcutorture.gp_exp=1 rcutorture.fwd_progress=0".

Enlisting Debugging Assistance

Still other times, it is helpful to enable event tracing. For example, if the rcu_barrier() event traces are of interest, use --bootargs "trace_event=rcu:rcu_barrier". The trace buffer will be dumped automatically upon specific rcutorture failures. If the failure mode is instead a non-rcutorture-specific oops, use this: --bootargs "trace_event=rcu:rcu_barrier ftrace_dump_on_oops". If it is also necessary to dump the trace buffers on warnings, a (heavy handed) way to achieve this is to use --bootargs "trace_event=rcu:rcu_barrier ftrace_dump_on_oops panic_on_warn".

If you have many tens of rcutorture instances that all decide to flush their trace buffers at about the same time, the combined flushing operations can take considerable time, especially if the underlying system features rotating rust. If only the most recent activity is of interest, specifying a small trace buffer can help: --bootargs "trace_event=rcu:rcu_barrier ftrace_dump_on_oops panic_on_warn trace_buf_size=3k".

If only the oopsing/warning CPU's traces are relevant, the orig_cpu modifier can be helpful: --bootargs "trace_event=rcu:rcu_barrier ftrace_dump_on_oops=orig_cpu panic_on_warn trace_buf_size=3k".

More information on tracing can be found in Documentation/trace, and more on kernel boot parameters in general may be found in kernel-parameters.txt. Given the multi-thousand-line heft of this latter, there is clearly great scope for tweaking your torturing of RCU!

Why Stop at Torturing RCU?

After all, locking can sometimes be almost as annoying as RCU. And it is possible to torture locking:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock

This locktorture stress test does not get as much love and attention as does rcutorture, but it is at least a start.

There are also a couple of RCU performance tests and an upcoming smp_call_function*() stress test that use this same torture-test infrastructure. Please note that the details of the summary output varies from test to test.

In short, you can do some serious torturing of RCU, and much else besides! So show them no mercy!!! :-)
inside

The Old Man and His Smartphone, 2020 Spring Break Episode

Complete draining of my smartphone's battery was commonplace while working from home. After all, given laptops and browsers, to say nothing of full-sized keyboards, I rarely used it. So I started doing my daily web browsing on my smartphone at breakfast, thus forcing a daily battery-level check.

This approach has been working, except that it is quite painful to print out articles my wife might be interested in. My current approach is to email the URL to myself, which in a surprisingly ornate process:

  1. Copy the URL.
  2. Start an email.
  3. Click on the triple dot at the upper right-hand side of the keyboard.
  4. Select the text-box icon at the right.
  5. Select “paste” from the resulting menu, then hit “send”.
  6. Read email on a laptop, open the URL, and print it.

The addition of a control key to the virtual keyboard might be useful to those of us otherwise wondering “How on earth do I type control-V???” Or I could take the time required to figure out how to print directly from my smartphone. But I would not recommend holding your breath waiting.

What with COVID-19 I and the associate lockdowns, I have not used my smartphone's location services much, helpful though it was in the pre-COVID-19 days. For example, prior to a business trip to Prague, my wife let me know that she wanted additional copies of a particular local craft item that I had brought back on a prior trip almost ten years ago. Unfortunately, I could not remember the name of the shop, nor were the usual search engines any help at all.

Fortunately, some passers-by mentioned Wenceslas Square, which triggered a vague memory. So I used my smartphone to go to Wenceslas Square, and from there used the old-school approach of wandering randomly. Suddenly, I knew where I was, and sure enough, when I turned to my right, there was the shop! And the craft item was even in the same place within the shop that it had been on my earlier visit!

Of course, the minute I completed my purchase, my smartphone and laptops were full of advertisements for that craft item, including listing any number of additional shops offering it for sale. Therefore, although it is quite clear that the “A” in “AI” stands for “artificial”, I am forced to dispute the usual interpretation of the “I”.

My smartphone also took the liberty of autocomposing its first-ever reply to an email, quite likely because I failed to power it off before lying it down on its screen on a not-quite-flat surface. The resulting email was heavy on the letter “b” and contained lots of emo and angst, perhaps because the word "bad" occurred quite frequently. This draft also included an instance of the name “Bob Dylan”. I will leave any discussion of the morals and immorals of this particular AI choice to the great man's many fans and detractors.

I can only be thankful that the phone left its composition in draft mode, as opposed to actually sending it. In fact, I was so shocked by the possibility that it could well have sent it that I immediately deleted it. Of course, now I wish that I had kept it so I could show it off. As they say, haste makes waste!

However, I did find the following prior effort in my “Drafts” folder. This effort is nowhere near as entertaining as the one I so hastily deleted, but it does give some of the flavor of my smartphone's approach to email autocomposition:
But there is no doubt about the way the bldg will do it in this smartphone a while now that the company is still in its position as the world's most profitable competitor to its android smartphone and its android phone in its own right and will continue its search to make its way through its mobile app market and its customers will have to pay attention for their products to the web and other apps for their customers by clicking the button and using a new app BBC to help you get your phone back in your browser and your browser based phone number and the number one you can click to see you in your browser or the other apps that are compatible or the app you use for your browser or a computer and both have or Google and you will have a lot more to say than the one that is not the only way you could not be in a good mood to get the most of your life and the rest you are in for the next two days and the rest is not a bad for you are you in a good place and the best thing you could be doing to help your family and your friends will have a sense that they can help them get their jobs done in a way that's what you are going through with your work in a good place to work and make them work better and better for their job than you can in a long term way and you are a better parent and you are not going through the process and the process is going through a good job of thinking that you're not a teacher and a teacher who believes that the best thing to be is that your browser will have the number and access of the app you can get to the web and the app is available to users for a while to be sure you can use the internet for a while you are still in a position where I have a few more questions to ask you about being able and the app you have on your computer will have to do not use it as an app you have for a
And so I have one small request. Could those of you wishing for digital assistants please consider the option of being more careful what you wish for?

My smartphone also came in handy during a power outage: The cell towers apparently had backup generators, and my smartphone's battery, though low, was not completely drained. I posted noting my situation and battery state online, which in turn prompted a proud Tesla owner to call attention to the several hundred kilowatt-hours of electrical energy stored in his driveway. Unfortunately for me, his driveway was located the better part of a thousand miles away. However, it did remind me of the single kilowatt hour stored in my conventional automobile's lead-acid battery. But fortunately, the power outage lasted only a few hours, so my smartphone's much smaller battery was sufficient to the cause.

As you would expect, I checked my smartphone's specifications when I first received it and learned that it has eight CPUs, which is not unusual for today's smartphones.

But it only recently occurred to me that the early 1990s DYNIX/ptx system on which I developed RCU had only four CPUs.

Go figure!!!