You are viewing paulmck

(no subject)

Hello, Justin!

On #1, please see Registration required, but otherwise freely downloadable.

On to #2.

Practitioners, myself included, tend to use the following debugging techniques for parallel software: assertions, breakpoints, printf(), lightweight event tracing, statistical counters, formal methods, and programmatic combinations of the above. Others will no doubt add their favorite techniques to the mix.

Assertions are straightforward, relatively lightweight when they don't fire (hopefully the common case), and quite helpful in cases where the bug has a low probability of occurring. When debugging user-mode code, I normally run within a debugger in order to inspect global state after the assertion fires.

I often use breakpoints as a lightweight assertion, for example, setting breakpoints in locations that a particular set of inputs should not reach. Single-stepping parallel applications can also be useful, but inter-thread dependencies can make it difficult. That said, there is no substitute for single-stepping the code when running it single-threaded, in part because the vast majority of bugs are not subtle parallel-programming issues, but rather simple mistakes that affect both single-threaded and parallel execuion. (And you do test your code running single-threaded before running it in parallel, don't you?)

Despite its shortcomings, printf() does actually see some use. While it is possible that the bug is profoundly dependent on timing, bugs are often reasonably robust and printf() is profoundly easy to use. So one not-uncommon approach is to insert printf()s, and if that makes the bug morph or even disappear completely, substitute one of the lighter weight (but harder to use) techniques below.

Lightweight event tracing uses a per-thread (or per-CPU) data structure to record the trace events, with the usual example being a ring buffer. This approach is well-suited to escape-action support in current hardware. Ring-buffer overflow can be a problem (especially for lightweight primitives such as RCU) so the following techniques are often used in combination with event tracing.

When using the statistical counter technique, you simply increment per-thread variables at selected points in your code, which also works well with current hardware's escape-action support. These counters can be read out at any convenient point, and the values examined for discrepancies. These discrepancies often point directly at the bug.

Livejournal's length limitations mean that the rest of this discussion is in the next comment. ;-)

Comment Form

No HTML allowed in subject


Notice! This user has turned on the option that logs your IP address when posting. 

(will be screened)