You are viewing paulmck

Tired of ads? Upgrade to paid account and never see ads again!

Previous Entry | Next Entry

Parallel Programming: Bogus Benchmarks

Benchmarking has a checkered history in IT. It is not for nothing that some have paraphrased the famous and well-worn quotation “lies, damned lies, and statistics” to “lies, damned lies, and benchmarks.” This is of course unfair to benchmarks, which have been an important driving force behind improvements to computing technologies ranging from CPUs to compilers to storage arrays to graphics processors to databases. At their best, benchmarks play a number of exceedingly important roles:

  1. Providing a fair framework for comparing competing implementations on performance, price-performance, performance per watt, time to solution, and much else besides.
  2. Focusing competitive energy on improving implementations in ways that matter to users.
  3. Serving as example uses of the technologies being benchmarked.

Benchmarks from the Transaction Processing Council (TPC) and the Standard Performance Evaluation Corporation (SPEC) have been very helpful in the past, and so we can hope that good benchmarks will play an important role in accelerating the evolution of multicore hardware and software. That said, designing benchmarks is a painstaking exercise, not to be undertaken lightly. It is therefore interesting to look carefully at the Fibonacci benchmark that has been used for well over a decade to evaluate parallel work-stealing schedulers. An early paper using this benchmark admits that this is “a terrible way to compute Fibonacci numbers”, but argues that it stresses task creation and synchronization overhead.

A doubly recursive C-language implementation of the Fibonacci function takes about 18 seconds to compute fib(47) on my laptop. (Why fib(47)?) One could imagine this running faster if executed in parallel, although considerable imagination is required. For purposes of comparison, an iterative C-language implementation computes fib(47) in less than 100 nanoseconds, more than 8 orders of magnitude faster. Worse yet, the interpretive computer algebra system Maxima computes the exact 200,000-digit value of fib(1,000,000) in less than 500 milliseconds, which is still much less time than that required for fib(47) by the doubly recursive C-language implementation.

But again, the authors of the aforementioned paper argue that this stresses the thread creation and synchronization overhead. Is this a good justification for use of this benchmark?