How to make it reproduce faster? Or at all, as the case may be?
One approach is to tweak the Kconfig options and maybe even the code to make the failure more probable. Another is to find a “near miss” that is related to and more probable than the actual failure.
But given that we are trying to make a race condition happen more frequently, it is only natural to try tweaking the number of CPUs. After all, one would hope that increasing the number of CPUs would increase the probability of hitting the race condition. So the straightforward answer is to use all available CPUs.
But how to use them? Run a single rcutorture scenario covering all the CPUs, give or take the limitations imposed by qemu and KVM? Or run many instances of that same scenario, with each instance using a small fraction of the available CPUs?
As is so often the case, the answer is: “It depends!”
If the race condition happens randomly between any pair of CPUs, then bigger is better. To see this, consider the following old-school ASCII-art comparison:
+---------------------+ | N * M | +---+---+---+-----+---+ | N | N | N | ... | N | +---+---+---+-----+---+
If there are n CPUs that can participate in the race condition, then at any given time there are n(n-1)/2 possible races. The upper row has N*M CPUs, and thus N*M*(N*M-1)/2 possible races. The lower row has M sets of N CPUs, and thus M*N*(N-1)/2, which is almost a factor of M smaller. For this type of race condition, you should therefore run a small number of scenarios with each using as many CPUs as possible, and preferably only one scenario that uses all of the CPUs. For example, to make the TREE03 scenario run on 64 CPUs, edit the tools/testing/selftests/rcutorture/confi
But there is no guarantee that the race condition will be such that all CPUs participate with equal probability. For example, suppose that the bug was due to a race between RCU's grace-period kthread (named either rcu_preempt or rcu_sched, depending on your Kconfig options) and its expedited grace period, which at any given time will be running on at most one workqueue kthread.
In this case, no matter how many CPUs were available to a given rcutorture scenario, at most two of them could be participating in this race. In this case, it is instead best to run as many two-CPU rcutorture scenarios as possible, give or take the memory footprint of that many guest OSes (one per rcutorture scenario). For example, to make 32 TREE03 scenarios run on 64 CPUs, edit the tools/testing/selftests/rcutorture/confi
What happens in real life?
For a race condition that rcutorture uncovered during the v5.8 merge window, running one large rcutorture instance instead of 14 smaller ones (very) roughly doubled the probability of locating the race condition.
In other words, real life is completely capable of lying somewhere between the two theoretical extremes outlined above.