You are viewing paulmck

Previous Entry | Next Entry

elephant, penguin
My children seem to have decided that I am not putting enough bugs in my software, given that they gave me one of these:


This does raise the question as to whether I prefer bugs in my software or in my candy. In this case, the answer is easy: the candy is transparent, so I know what I would be eating, so that cricket in the candy is far less troublesome than a bug in my software. Of course, if the candy were opaque, this decision might be somewhat more difficult.

In happy contrast to previous lives, these days my software is also transparent: freely available for all to download, use, study, redistribute, and improve. This means that other people can find my bugs (and vice versa), which can be surprisingly helpful. It is often the case that the “simple” matter of identifying the bug is the most difficult step in the process of fixing it.

Bugs can be inserted into code in the process of writing it and in the process of fixing other bugs, and I certainly have inserted plenty of bugs using both of these time-honored mechanisms. However, bugs can also appear spontaneously, as a result of changing workloads and changing requirements. I offer two examples from the current mainline implementation of RCU:

  1. The call_rcu() function has a tunable safety limit (defaulting to 10,000 callbacks per CPU) beyond which RCU takes extreme measures to force the current grace period to complete. This has worked flawlessly for more than five years, but recent patches (which are otherwise eminently reasonable) cause this limit to be exceeded on a routine basis. Poof!!! The call_rcu() function now has a bug! And yes, I sent out a temporary hack, and now have a real fix in mainline.
  2. Both Mathieu Desnoyers and Udo Steinberg have recently argued that the call_rcu() function should execute deterministically when invoked from a process running at real-time priority. And older versions of call_rcu() used to do just that. However, recent concerns with grace-period latency have resulted in my adding non-deterministic grace-period-acceleration code to call_rcu(). So, does call_rcu() have a real-time-response bug or doesn't it? I guess that the final answer is up to me, but the more I think about it, the more reasonable Mathieu's and Udo's position seems. So it seems that a second call_rcu() bug is in the offing.

All that aside, exactly when did these bugs appear? When the code was last changed? When the new use cases were first articulated? Or when I decided to accept them as bugs?

In the end, I guess it really doesn't matter so long as I fix them.

Comments

paulmck
Oct. 10th, 2009 07:51 pm (UTC)
Re: Broken link
Thank you -- fixed now.