paulmck (paulmck) wrote,
paulmck
paulmck

Stupid RCU Tricks: Failure Probability and CPU Count

So rcutorture found a bug, whether in RCU or elsewhere, and it is now time to reproduce that bug, whether to make good use of git bisect or to verify an alleged fix. One problem is that, rcutorture being what it is, that bug is likely a race condition and it likely takes longer than you would like to reproduce. Assuming that it reproduces at all.

How to make it reproduce faster? Or at all, as the case may be?

One approach is to tweak the Kconfig options and maybe even the code to make the failure more probable. Another is to find a “near miss” that is related to and more probable than the actual failure.

But given that we are trying to make a race condition happen more frequently, it is only natural to try tweaking the number of CPUs. After all, one would hope that increasing the number of CPUs would increase the probability of hitting the race condition. So the straightforward answer is to use all available CPUs.

But how to use them? Run a single rcutorture scenario covering all the CPUs, give or take the limitations imposed by qemu and KVM? Or run many instances of that same scenario, with each instance using a small fraction of the available CPUs?

As is so often the case, the answer is: “It depends!”

If the race condition happens randomly between any pair of CPUs, then bigger is better. To see this, consider the following old-school ASCII-art comparison:

+---------------------+
|        N * M        |
+---+---+---+-----+---+
| N | N | N | ... | N |
+---+---+---+-----+---+

If there are n CPUs that can participate in the race condition, then at any given time there are n(n-1)/2 possible races. The upper row has N*M CPUs, and thus N*M*(N*M-1)/2 possible races. The lower row has M sets of N CPUs, and thus M*N*(N-1)/2, which is almost a factor of M smaller. For this type of race condition, you should therefore run a small number of scenarios with each using as many CPUs as possible, and preferably only one scenario that uses all of the CPUs. For example, to make the TREE03 scenario run on 64 CPUs, edit the tools/testing/selftests/rcutorture/configs/rcu/TREE03 file so as to set CONFIG_NR_CPUS=64.

But there is no guarantee that the race condition will be such that all CPUs participate with equal probability. For example, suppose that the bug was due to a race between RCU's grace-period kthread (named either rcu_preempt or rcu_sched, depending on your Kconfig options) and its expedited grace period, which at any given time will be running on at most one workqueue kthread.

In this case, no matter how many CPUs were available to a given rcutorture scenario, at most two of them could be participating in this race. In this case, it is instead best to run as many two-CPU rcutorture scenarios as possible, give or take the memory footprint of that many guest OSes (one per rcutorture scenario). For example, to make 32 TREE03 scenarios run on 64 CPUs, edit the tools/testing/selftests/rcutorture/configs/rcu/TREE03 file so as to set CONFIG_NR_CPUS=2 and remember to pass either the --allcpus or the --cpus 64 argument to kvm.sh.

What happens in real life?

For a race condition that rcutorture uncovered during the v5.8 merge window, running one large rcutorture instance instead of 14 smaller ones (very) roughly doubled the probability of locating the race condition.

In other words, real life is completely capable of lying somewhere between the two theoretical extremes outlined above.
Tags: rcu, scalability, stupid rcu tricks
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments