You are viewing paulmck

Previous Entry | Next Entry

inside
Later in this series of postings, we will get back to the question of what lessons from the old Great Software Crisis might be applied to the new Great Parallel Software Crisis. In the meantime, this post will look at what can make parallel programming easy. Just to maintain my reputation for arguing with myself, several upcoming posts will examine things that can make parallel programming difficult.

The easiest problems to parallelize are those whose data is partitionable. In this case, just assign a separate CPU to each partition and stand back! This can be as easy as the following shell script:

./myprogram 1 & ./myprogram 2 & ./myprogram 3 &

Here we run three copies of the program, each working on a different part of the problem. Other examples include the data parallelism used in technical computing and the data locking used in operating systems and server applications.

Many types of computer-aided design have datasets that are nearly partitionable, a classic case being finite-element analysis. Here, an grid is superimposed on the object being analyzed, and a CPU assigned to a region of the grid. As we make the grid more fine-grained (thus usually making the solution more accurate) by a factor of N, the amount of computation per CPU rises as N3, while the amount of communication between CPUs only rises as N2. As N rises, the fraction of effort required for communication becomes arbitrarily small, allowing arbitrarily good performance and scalability.

However, as soon as there is communication, we do need to start being careful. Even in the tightest SMP, the overhead of communicating is a good order of magnitude greater than that of computation, and this ratio often amounts to multiple orders of magnitude. Therefore, even a small amount of communication can destroy your software's scalability, regardless of whether that communications is within a single CPU core or across a large cluster interconnect. This is why partitioning is a good thing, the more complete the partitioning the better. Completely partitionable problems are often called “embarrassingly parallel”, though has never been clear to me why partitioning should embarrass anyone. Perhaps it is a culture thing, but whatever it is, I simply think of it as an embarrassment of performance and scalability riches.

But what about RCU? RCU neither partitions data nor replicates it, so why is it fast?

Of course, for lazy people like me, the best cases are where the parallelism is done for you, as has long been the case for SQL developers. The database's query optimizer handles the parallelism, so that the developers don't have to. That said, a quick Google search for “SQL performance” should suffice to convince you that even SQL developers have not yet reached Parallel Nirvana.

But can you think of cases where parallelism is easy but useless?

Partitioning and replicating your program's data can be a wonderful path to excellent performance and scalability, but life is not always that easy. The next set of posts will look at some of the things that can make life difficult.

Comments

paulmck
Dec. 13th, 2009 06:40 am (UTC)
My take on points of agreement and otherwise
We both agree that libraries can provide some parallelism to an otherwise sequential applications. You are more enthusiastic about this approach than am I, but reasonable people can hold different opinions and outlooks.

We both agree that specialization can ease automation of parallelization and other optimizations, at the expense of limiting what the user can do.

We both agree that Amdahl's law is, if not universal, pretty close to it. The amount of trouble that Amdahl's law causes you will depend on your application. It is quite possible that you have seen a group of applications that are more amenable to library-based scalability than those that I am familiar with. But please note that I wholeheartedly agree that library-based parallelism is valuable and has a strong role to play in the Great Parallel Software Crisis. We might differ on exactly how strong a role, but we do agree that it will play a strong role.

You are more focussed on language features than am I, and again, reasonable people can hold differing opinions and outlooks.

All that aside, regardless of our individual experiences and outlooks, I thank you for your contributions to parallelism in general and wish you the best with VSIPL++!!!
(Anonymous)
Dec. 15th, 2009 01:43 am (UTC)
Re: My take on points of agreement and otherwise
I think we agree on everything. My astonishment that a library could automatically parallelize a large class of computations with no restriction on expressive power, and no appreciable overhead or conceptual leakage, led me to explore what conditions allowed that to be achieved. Maybe you're harder to astonish...
paulmck
Dec. 15th, 2009 07:11 pm (UTC)
Some might indeed argue that my hair color indicates that my supply of astonishment is greatly diminished. ;-)