As some of you know, I have been working on a book on parallel programming. My thought had been to complete it, then announce it. But I finally realized that it really never will be complete, at least not as long as people keep coming up with new parallel-programming ideas, which I most definitely hope will continue for a very long time.
So, here it is!
So, here it is!

Comments
I do remember you giving Swathi/me a link to the book you were working on (perfbook.git IIRC) when she wanted to use RCU for DNS related work! This is the same book, right ?!
Based on a quick look at this book, I'd probably pick up a paper copy if a high-quality one was easily available: I still haven't adopted an e-reader, and I hate trying to collect 100+ page documents off the printer. (Note that I have no financial interest in lulu.com or its competitors.)
davew@cs.haverford.edu
I'm working on a flow-based programming system in which cooperating components are the "IC/Integrated Circuit" of it's parallel programming model. The components are modeled as cooperating co-routines/green threads/fibers which read and write to each others I/O channels.
In the simplest implementation there are no actual contention, locking or critical code sections because each component reads from it's inputs, performs some computation and writes to it's outputs. After the component is done it yields the CPU to the scheduler which transfers control to the next component.
In actual practice, the cooperative components are distributed between amongst 'n' available threads, with one thread per CPU core. Data transfer between components running concurrently on separate threads is is passed through non-locking, cache-coherent, FIFOs.
This system has the advantages of simplicity and re-usability of programming (much like UNIX pipes) with relative freedom from cache stalls and the thread rescheduling that happens when a locked critical section is encountered by a thread.
My system is, of course, nothing new. Systems like LINDA started flow-based programming a long time ago.
I was wondering if you were going to be covering flow-based systems within your analysis (Obvious MPI approaches this using message passing).
thx
eris
The closest thing to this approach currently in the book is pipelining, which is a trivial variant of flow-based programming. I don't have any immediate plans to go beyond this at the moment,
Were you interested in writing up a survey of the various flow-based programming efforts, with emphasis on open-source software?
Thanx, Paul
Regards, Paul Morrison
Transistor count per unit area per unit money will approximately double every 18 months.
Thus, your chart in no way substantiates the statement that Moore's Law has ceased to provide benefit. Pure MIPS might have been slightly more useful than the mixed MIPS/clock rate chart you present, seeing as the Core line of CPUs has lower clock speed than the Pentium 4 series almost across the board, but is far more powerful. Still, Moore's law is the very thing that has made multicore possible - aside from dedicating more transistors to a single core to make it faster, chipmakers have enough to dedicate them to an entire second, fourth, or sixth core.
That said, I can see how you might read the current wording in that way. I am considering adding a Quick Quiz raising your point. Does that seem reasonable to you?
I'm not a fan of PDF; it's (generally) not reflowable.
The challenges will be handling the Quick Quizzes and the other script-extracted content. It might be possible to provide scripting that fixed things up for latex2html, but I have not put much thought into this.
Again, I would welcome a good patch that did this.
A quick look brings up pandoc, which can convert a subset of LaTeX to HTML as well as ePub (useful for all those folks with electronic readers, e.g. Nook, iPad).
I don't have the time to check that the formatting works, but here's the result for anyone else as impatient as I am :-)
http://www.bone.id.au/~paul/perfbook.epu
http://www.bone.id.au/~paul/perfbook.mob
Thanks.
the specified lock, and the pthread_mutex_lock()
“releases” the specified lock.
lacks a "un" in the second pthread_mutex_lock()
Regards,
Etienne.
Etienne.
Let me know if there's a way to donate $ for your efforts, as a thank you. =)
No need for $$$, but thank you for asking.
One typo -- on page 154, item 5 in the list should be the consumer picking up the producer's timestamp, yes?
Thank you!
If the former, there's really not a whole lot that needs to be added to the book. It's very clearly written and covers most of what anyone would need to understand parallelism. The only chapter I don't see is one on interconnects. This matters because interconnects define what will (and will not) work in the way of techniques. It's not that the interconnects are directly important, but they do place limits on those things that are.
On the other hand, if it's a book on practical parallel programming, you'd need a section covering the ideas in communication. "Conflicting Visions of the Future" is excellent as-is and might well be the correct place to discuss things like RDMA, but it doesn't feel right for discussing message passing versus networked inter-process communication.
I have absolutely no idea if you'd want to cover instruction-level parallelism (as per Silk and Silk++, but also an area a lot of early parallel research looked into). It's such a totally different beast than regular parallelism.
Bottom line, great book and I'd love to see if there's anything I could usefully contribute, but to avoid wasting your time or anyone else's, I'd want to know what would be interesting to you as the author-in-chief and editor-in-chief.
The focus at the moment is primarily on shared memory because such systems are cheap, easy to set up, and readily available. Plus because my own experience has been primarily with shared memory, and additionally because the most likely audience are people using multi-core systems.
When you talk about interconnects, are you thinking in terms of the internal interconnects in shared-memory/multicore systems, or in terms of the message-passing interconnects used in large supercomputing clusters?
"Conflicting Visions of the Future" should be for topics where there is genuine uncertainty and conflict. Possible topics include limits to hardware and software scalability, multicore-computing fads from various groups, and what sorts of parallel-debug assists are required. That said, this book is not to be a marketing vehicle for either vendors or academia.
I of course welcome contributions! One question to ask yourself is "what area am I expert in that a lot of people need to know or will soon need to know?" Thoghts?
In other words, stuff that makes a system almost transparently scalable. Explicit high-level stuff like MPI-2 and Bulk Synchronous Processing get ugly, there's masses of tedious detail, and frankly most of the important stuff (avoiding deadlocks, avoiding simultaneous writes, etc) is stuff you already cover. Some aspects of parallelism are universal.
In the case of ethernet, for example, there's the question of how to pretend to have shared memory. (Distributed Shared Memory schemes tend to be rare and are often badly written.) There's also the question of whether you can use process migration systems like MOSIX or Kerrighed to do MIMD-style parallel processing more effectively than you could on a multi-core or SMP computer.
Also on Ethernet, you can very easily pass around a lot of data to a lot of machines simultaneously, using Scalable Reliable Multicast. Which, incidentally, MPI does not use. MPI rotates round the list of destinations in a collective call and sends messages on a sequential basis. Implementations of SRM and NORM (NACK-Oriented Reliable Multicast) are widely available for most platforms, Windows and Linux included, so it's vendor-independent. They are not, however, significantly used in the industry, although one would think that MISD-style parallelism would find the mechanism to be extremely valuable.
For many busses, the time it takes to switch a lane from one direction or target to another is absurdly high. The cost of working at such fine detail is also absurdly high, since a lot of parallel processing languages (UPC, Erlang, etc) are too high a level to let you do much tuning. The question is how to make the best use of time to get the best use out of the system. And that's a hard problem.
"What area am I expert in that a lot of people need to know or will soon need to know?" If I knew that, I'd be rich. :) I'm expert in plenty of areas, I frequently find use for that expertise, but anticipating which areas would be useful for others has always been a tough one for me.
Btw. do you know if there's a way to generate a Table of Contents in the PDF, which would be displayed in the side pane in many PDF viewers? That would be really useful for viewing such a large book.
The switch to pdflatex did enable table of contents, so if you pull from the git archive and do "make", you should see a PDF TOC.
Jin