You are viewing paulmck

2012 LPC Call For Microconference Topics

LPC Logo
The 2012 Linux Plumbers Conference (LPC) will be held on August 29-31 in the Sheraton San Diego, and we hope to see you there!

To that end, the LPC Planning Committee is pleased to announce a call for microconferences. These microconferences are working sessions that are roughly a half day in length, each focused on a specific aspect of the “plumbing” in the Linux system. The Linux system’s plumbing includes kernel subsystems, core libraries, windowing systems, media creation/playback, and so on. For reference, last year’s LPC had tracks on Audio, Bufferbloat and Networking, Cloud, Containers and Cgroups, Desktop, Development Tools, Early Boot and Init Systems, File and Storage Systems, Mobile, Power Management, Scaling, Tracing, Unified Memory Management, and Virtualization.

Please note that submissions to a given microconference should not normally cover finished work. The best submissions are instead problems, proposals, or proof-of-concept solutions that require face-to-face discussions and debate among people from different areas of the Linux plumbing. In other words, the best microconferences are working sessions that turn problems into patches representing solutions.

Leading an LPC microconference can be a fun, exciting, and rewarding activity, but please see here for the responsibilities of a microconference working session leader. If you have an idea for a good LPC microconference, and especially if you would like to lead up a particular microconference, please add it to the LPC wiki here.
TMEverywhere
The past year has been a busy one for transactional memory (TM). IBM's Blue Gene/Q and Intel's Haswell TSX have joined Sun Rock and Azul's Vega 2 in offering hardware transactional memory (HTM). All of these implementations seem to be focusing on TM in the small, wisely in my opinion, given TM's weak support for many programming-in-the-large use cases, particularly I/O and system calls. A number of TM researchers seem to be coming around to this view, as evidenced by slide 279 of these lecture slides. All of this is to the good, especially given some of the unrealistically high expectations that some people have harbored for TM.

So what do these HTM implementations offer? Perhaps it can best be thought of as an extension of LL/SC or CAS that operate on more than one memory location, with the number being limited by hardware considerations such as cache geometry. As with both LL/SC and CAS, TM can fail, both due to conflicts between concurrently executing transactions and because of hardware events such as exceptions and cache-line conflicts. Interestingly enough, this means that in many cases, hardware transactions must have some other synchronization mechanism (such as locking!) as a fallback. Unfortunately, this means that you cannot count on HTM to simplify deadlock cycles. However, it might well increase performance for some common types of large non-statically-partitionable data structures subject to smallish updates. The canonical example of such a data structure seems to be red-black trees.

One thing that disappoints me personally about all of these mechanisms is their limited support for RCU. Don't get me wrong, you can use RCU with HTM updates. However, bare RCU readers will typically cause HTM updaters to unconditionally abort. Although RCU's purpose is in fact to favor readers, this is entirely too much of a good thing. For ideas for improved interactions between TM and RCU, please see an earlier posting. It is also unclear to what extent HTM helps in cases involving severe real-time-response constraints.

There have also been some interesting software transactional memory (STM) papers. Pankratius and Adl-Tabatabai published a productivity comparison in which the highest productivity was attained by teams that used both STM and locking. Although one can argue that inevitable transactions will subsume the uses of locking in the Pankratius paper, inevitable transactions are normally implemented as a global lock. sharply limiting I/O scalability.

Dragojevic et al. published what is often viewed as a riposte to Cascaval et al.'s Software Transactional Memory: Why Is It Only a Research Toy? in the CACM article entitled Why STM can be more than a Research Toy (see here for a related technical report). The Dragojevic article has rare graphs showing semi-reasonable scalability, with up to 12 times sequential execution rate on 16 CPUs, which is certainly not splendid, but neither is it altogether embarrassing. However, that speedup is the best results from a set of 17 benchmarks, many of which show negative scalability.

But a closer reading shows that Dragojevic's and Cascaval's ideas of exactly what constitutes “software transactional memory” are quite different. Dragojevic's results use a variant of STM that takes hints from the programmer in order to indicate which data is shared or not. This is an interesting hack, and it is hard to argue with the improvement over straight STM, but it does seem to back away from the earlier TM promises of transparent parallelism.

These hints raise some interesting possibilities. Suppose that the programmer also hinted that certain pairs of transactions could never conflict due to accessing disjoint data. One could take this a step further and use names for different disjoint subsets of the data. Such hints could greatly reduce or even remove the need for complex contention managers and associated aborts and rollbacks. But the best thing about this sort of scheme is that it has seen heavy production use with good performance and scalability extending back for decades. It is called “locking”. ;-)

What is the overhead of rcu_read_lock()?

RCU
A recent post to LKML stated that the patch in question did not plan to hold any global locks, including rcu_read_lock(), presumably because of concerns about contention or overhead. This blog entry is intended to help address any lingering concerns about rcu_read_lock() contention and overhead.

To be fair, at first glance, rcu_read_lock()'s source code does look a bit scary and slow:

static inline void rcu_read_lock(void)
{       
        __rcu_read_lock();
        __acquire(RCU);
        rcu_lock_acquire(&rcu_lock_map);
}

However, a closer look reveals that both __acquire() and rcu_lock_acquire() compile to nothing in production kernels (CONFIG_PROVE_LOCKING=n). This leaves __rcu_read_lock(), which is compiled differently for different settings of CONFIG_PREEMPT.

Let's start with CONFIG_PREEMPT=n. We have:

static inline void __rcu_read_lock(void)
{       
        preempt_disable();
}

And, again for CONFIG_PREEMPT=n:

#define preempt_disable()               do { } while (0)

The overall result for rcu_read_lock() when CONFIG_PREEMPT=n is therefore as follows:

static inline void rcu_read_lock(void)
{       
}

Similar analysis of rcu_read_unlock()</code> reveals that it is also an empty static inline function for CONFIG_PREEMPT=n. It is to be hoped that this is sufficiently lightweight for most practical purposes. If you find a case where it is too heavyweight, I would be very interested in hearing about it!

That leaves CONFIG_PREEMPT=y, which actually has executable code in its definition of __rcu_read_lock() as follows:

void __rcu_read_lock(void)
{       
        current->rcu_read_lock_nesting++;
        barrier();
}

The first statement is a simple non-atomic increment of a simple int that is located in the task's own task_struct. The barrier in the second statement expands as follows:

#define barrier() __asm__ __volatile__("": : :"memory")

This is an empty asm that generates no code, but that does disable code-motion optimizations that might otherwise move memory references across the barrier() statement. So, the executable code for rcu_read_lock() for CONFIG_PREEMPT=y is as follows:

void rcu_read_lock(void)
{       
        current->rcu_read_lock_nesting++;
}

A similar analysis of rcu_read_unlock() for CONFIG_PREEMPT=y yields the following:

void __rcu_read_unlock(void)
{
        struct task_struct *t = current;

        if (t->rcu_read_lock_nesting != 1)
                --t->rcu_read_lock_nesting;
        else {
                t->rcu_read_lock_nesting = INT_MIN;
                if (unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
                        rcu_read_unlock_special(t);
                t->rcu_read_lock_nesting = 0;
        }
}

The common case of a single level of rcu_read_lock() nesting executes the else clause of the first if statement, and only executes the then clause of the second if statement when the RCU read-side critical section was either preempted or ran for several milliseconds.

So in the common case, rcu_read_unlock() for CONFIG_PREEMPT=y executes two tests of task-local variables and two assignments to task-local variables.

This should be sufficiently lightweight for most purposes.

Of course, RCU is intended for read-mostly situations, and RCU updates can have significant overhead, incurring significant latency, CPU overhead, and/or cache misses. That said, there are some special cases where RCU updates can be extremely fast, as shown in Figures 12 and 13 of the supplementary materials to User-Level Implementations of Read-Copy Update. (No, the supplementary materials are not behind a paywall: Click on the “Supplemental Material(PDF)” link.)

BookAndGlasses
Woo-hoo!

User-Level Implementations of Read-Copy Update has appeared in the February 2012 issue of IEEE Transactions on Parallel and Distributed Systems. Many thanks to everyone involved, especially co-authors Jon Walpole, Michel R. Dagenais, Alan Stern (who did the symbolic-logic heavy lifting), and Mathieu Desnoyers (who is the lead author). Mathieu also managed to convince me to go once more into the breach, which was not an easy task given that I received my license to collect RCU rejection notices all the way back in 1995. ;-)

So it does feel very good to see this finally hit print!

Tags:

elephant, penguin
My presentation at the Real Time Linux Workshop this past October, titled “On migrate_disable() and Lantencies” (presentation), was a bit of a departure for me. This presentation used a Markov model to analyze the behavior of some recent changes to the scheduler for the 3.0 series of -rt kernels.

Although this approach produced some interesting results, one difficulty is that a number of the corresponding scheduler measurement simply do not fit the exponential distribution very well. This question of course came up during my talk, where an audience member suggested instead using the Erlang distribution. Unfortunately, my only memory of Erlang distributions was of a 1980s operations-research class, where I learned how to use an Erlang distribution, but not why anyone would want to, at least not beyond the professor's vague assurances that it is helpful when modeling telephony networks.

So I answered that I might consider an Erlang distribution, but that I figured that I could match the data quite well by using cascaded Markov-model stages to represent a single state in the higher-level model. The big advantage of this cascading approach is that the math and software remains the same: You simply map additional states into the model.

However, my work reducing RCU's need for scheduler-clock ticks took precedence, so it was only recently that I got time to work out the math for cascaded Markov-model stages. I got the results I expected, but I figured that I should also do a quick review of the Erlang distribution. So I got out my old copy of “Introduction to Operations Research” by Hillier and Lieberman. Imagine my surprise to learn that the Erlang distribution is exactly cascaded Markov-model stages!

So my response to the question during my talk was essentially: “I might try the Erlang distribution, but I am going to try deriving it from scratch first.” Oh well, I would much rather look foolish with the correct answer than to look smart with the wrong answer!

On the other hand, this approach did give me a very good understanding of not only how to use the Erlang distribution, but also how to derive it and why you might want to use it. :-)

The accretion of meaning

inside
Noam Chomsky once wrote “Colorless green ideas sleep furiously” as an example of a sentence that is grammatically correct but nonsensical. However, my daughter Sarah and I were discussing this earlier today, and each of us managed to generate meanings for this phrase.

Sarah:

  • Colorless: Uninspiring to the lumpen proletariat.
  • Green: Pertaining to improving the environment.
  • Sleep: Ignored by mainstream words and actions.
  • Furiously: Angering the idea's proponents.

In other words, uninspiring ideas for improving the environment are ignored by the mainstream, to the fury of their proponents.

Paul:

  • Colorless: Pertaining to transparency, as in window glass and also as in transparent transistors.
  • Sleep: Residing in an idle state, as in idle CPUs.
  • Furiously: Consuming excessive quantities of electrical power.

In other words, if you fail to specify CONFIG_NO_HZ=n when building the Linux kernel for a computer constructed of transparent transistors fabbed onto a pane of glass that controls that pane of glass, this colorless green idea will sleep furiously.

In both cases, the key element is the recently added meaning for the word “green”: As the word “green” has accreted a meaning, so has the sentence “Colorless green ideas sleep furiously.”

However, it turns out that Sarah's and my line of thought has been anticipated here, here, and here, and most probably elsewhere as well.

But that is OK. It turns out that Chomsky himself was also anticipated by Lucien Tesnière, the game cadavre exquis, and possibly also Bertrand Russell. Furthermore, a poet managed to accrete meaning onto Russell's “Quadruplicity drinks procrastination,” as can be seen here. The poet had originally intended to accrete meaning onto Chomsky's sentence, but, finding that others had beat him to it, turned to Russell's sentence instead.

The fact is that thinking up a new-to-you idea can be just as much fun as thinking up a new-to-the-human-race idea. Perhaps there really is nothing new under the sun. Even if there is something new under the sun, the human race is not the only thing under the sun! ;-)
elephant, penguin
My work on RCU does require a pioneering spirit. For example, there are no classes to guide my efforts other than those I teach, and there are no publications to draw from other than those I write. One saving grace is that I work in the Linux community, which means that people can (and often do!) contribute their thoughts and ideas, many of which are now reflected throughout the Linux kernel's RCU implementation. (Thank you all! You know who you are, and there are too many of you to name you all. If you want the list, the git log command is your friend.) Nevertheless, being too afraid to stray from the beaten path implies being too afraid to work on RCU.

But there are times when the RCU implementation needs a more sane approach. During those times, I must find some other outlet for my insanity: To do otherwise is to break RCU. Fortunately, this time around, an appropriate outlet was readily available in the guise of Ubuntu's new Unity window manager.

I decided to emulate the environment of a typical Linux hacker. This meant installing, configuring, and using Unity without the advice and counsel of my acquaintances within Ubuntu. To those acquaintances who might feel some irritation at the contents of the remainder of this blog entry, I do apologize in advance. However, experiments are invalid unless the environment is properly controlled, and part of the control for this experiment was to isolate myself from such help. I did search the web, including of course ubuntuforums.org.

So here are the issues I encountered, more or less in the order I encountered them:

  1. Right-clicking doesn't give you a way to configure the desktop, aside from setting the background image (pure black for me!). I found the answer in ubuntuforums.org: install and run “ccsm” to make major changes desktop configuration.
  2. I work with large numbers of xterms, so that each desktop has nine 24x80 xterms in a three-by-three pattern. (And yes, when screens were smaller, I used a four-by-four pattern.) I use a script creatively named “xterms” to create them, and I expect them to be automatically placed in a non-overlapping three-by-three pattern, which they did under Gnome. But under Unity, many of them will be placed directly on top of each other. The solution is to add a “sleep 0.250“ in my script's loop. So Unity appears to be trying to do the right thing, but is a bit slow on the uptake. I learned about this experimentally, mostly because when you query for “tiled” you get advice on how to make your windows never overlap. In contrast, I want my windows to be non-overlapping by default, but if I am (say) debugging rcu_start_gp(), I want to be able to expand the window from 24x80 vertically to use the full screen size, and in that case, I am happy with the resulting overlapping windows.
  3. Unity uses “workspaces” rather than “desktops”. Unfortunately, there doesn't seem to be to identify the windows of a given type that have been minimized from a given workspace. So if I have nine xterms in one workspace for my RCU work and another nine in another workspace for working on a paper, clicking the xterm button gets me all 18, shuffled around so that I cannot easily tell which goes with which workspace. Perhaps shift-click should show only those xterms associated with the current workspace!
  4. I tried to work around the above problem by creating multiple desktops via “ccsm”. However, although the additional desktops exist, Unity cannot show you any windows assigned to them. The only way to see them again is to reduce the number of desktops back to 1, which will force them back onto the sole remaining desktop. (This might actually be a useful feature, but...) Worse yet, when you have more than one desktop, you lose the ability to move a given window to some other workspace by right-clicking on it: Instead, you can only move it to other desktops. Longer term, Unity should more gracefully handle multiple desktops. Until then, “ccsm” should not offer to create more than one desktop. And until then, just say “no” to the temptation to create multiple desktops. Use workspaces instead.
  5. My habit of clicking on icons at the lower right corner of my screen to move among desktops died hard, but the window-s hotkey does turn out to be very nice. When you hit window-s, you get a screen that shows you all your workspaces, and you can use the arrow keys to move among them. When you get to your destination workspace, just hit the enter key.
  6. Unfortunately, focus does not always follow you from one workspace to another. A quick window-s;left-arrow;enter;"cd /foo";enter sequence is quite likely to cause the xterm on the previous workspace to change to directory /foo, which can be a bit disconcerting. This really badly needs fixing, as the mental effort to verify that focus has indeed followed me sometimes causes me to forget why I wanted to be in the new workspace in the first place. This can be frustrating. (And yes, I am not as young as I used to be. Then again, neither are you!)
  7. In Gnome, deleting a window (for example, typing exit to a bash prompt) automatically transfers focus to some other window on that same desktop. In contrast, in Unity, deleting a window sometimes transfers focus. Always would be far preferable!
  8. I searched for synaptic in vain, finally learning about the new Ubuntu Software Center icon, which is the shopping-bag icon on the left-hand toolbar. Ubuntu Software Center seems OK, though I am not sure whether or not I would be happy to do without apt-get on the command line. Fortunately, I have both.
  9. The left-hand toolbar did grow on me due to the fact that it seems to track the most heavily used applications. Unfortunately, there is no way to use this toolbar to query for the existence of an application: if there is an instance, it moves to the workspace containing the instance and transfers focus to it, but if there is no instance it creates one. (If there are multiple instances, it displays them all and lets you choose — but without letting you know which instance goes with which workspace.) Again, perhaps a job for shift-click, though there needs to be a way to: (1) query the current workspace and (2) query all workspaces. And a way to see which instances are associated with which workspaces. And a way to see which instances have been minimized.
  10. A natural reaction to this sort of behavior is to fire up “ccsm” and experiment with different settings. Bad move! Doing this has a high probability of fatally confusing Unity. “You too can power-cycle your laptop.”
  11. Unity sometimes loses track of the keyboard. Moving back and forth among workspaces helps it find the keyboard again. Unless the screen is locked, in which case life appears to be quite hard, with power-cycling the only option I have found thus far. Fortunately, this seems to be quite rare, only two sightings in the several weeks that I have been using Unity. Oops, make that three sightings. Hmmm... Four sightings. OK, this problem appears to be triggered by switching to a workspace then immediately hitting shift-F1.
  12. Under Gnome, resizing an xterm displays a handy popup showing how many rows and columns of text the xterm contains as it is being resized. Unity badly needs this feature.
  13. Where did the application and system menus go??? Well, they are gone. Once I got used to it, the replacement works much better for me. Shift-F1 pops up a window that allows you to search for apps, so that shift-F1;chr;enter pops up an instance of the Chromium browser. Shift-F2 works as it does in Gnome, except that it displays the possible completions as icons. Unfortunately, in both cases, Unity doesn't always keep up with touch-typists. Sometimes it re-executes the previous command instead of the one you just typed, even though the display has already updated to show the new icon. This indicates some performance, coordination, and concurrency issues: Unity's right hand does not always know what my left hand is doing! It is therefore not obvious to me that the Unity development and test teams include any touch typists. One way or another, this sort of thing badly needs fixing.
  14. Banshee is quite similar to Rhythmbox. One nice thing about Banshee is that it does not come with a pile of Internet radio stations preconfigured, so that you can easily find the ones you add. (My tastes in music are such that my favorites are never going to appear in any reasonable set of defaults.)
  15. If you push a window beyond the bounds of the sides or bottom of the screen, it ends up spanning multiple workspaces, where the workspaces are connected in a toroidal fashion. This is sort of OK, except that windows that span multiple workspaces (up to four of them!) are not handled gracefully by the left-hand taskbar. Although this behavior was mildly entertaining for a while, I would prefer that the workspaces not be connected.
  16. If you push a window up to the top of the screen, it maximizes. This is almost never what I want — the reason I pushed the window to the top of the screen was to have it appear at the top of the screen, not to have it monopolize the entire screen!!! On the rare occasions when I need to maximize a window, double-clicking the title bar works just fine, thank you!!!
  17. Where did the per-window menus go? Well, they are mostly gone. You can get them back by right-clicking the title bar of a given menu, but I am growing to like the alternative, which is to move the mouse to the very top of the screen, causing the per-window menus to appear on the screen's upper bar. This leaves more screen real estate for the rest of the application. However, some applications keep the window-bar menus, and I have no idea why.
  18. So I have a browser in one workspace, and I want one in another workspace. Clicking on the icon on the left-hand bar pops me back to the original workspace: Not what I wanted! Right-clicking on the icon pops up a menu that allows me to create a new browser instance in the current workspace. However, this works only for some applications.
  19. The soffice command does not automatically background itself, in unhappy contrast to Maverick's ooffice command. OK, OK, I can easily type &, but it is still annoying.
  20. If you start soffice from an xterm, it splats on that xterm every time you do “save as”. Yes, I know that you failed to open the file! I wouldn't have done “save as” if the file already existed!
  21. The soffice application occasionally goes into contrarian mode when you resize it. For example, an attempted horizontal expansion of the window might instead be applied vertically. Sometimes soffice simply refuses to let you resize it, which can be especially frustrating if it has decided that it should be so small that there is no room to actually display the document in question. Repeatedly maximizing and unmaximizing the window seems to get it out of this passive-aggressive mode of operation.


So Unity does appear to have some potential, and I intend to keep using it for at least a little while longer. However, it does still need a fair amount of additional work.

And it used to be so simple...

buggytux
It is something I have done many times under many different operating systems. You write something into the startup scripts, take the system down, then regain control at startup. So I planned to take this approach for my KVM-based rcutorture setup: Mount the image file, deposit a test script in the startup scripts, unmount the image file, start qemu, then analyze the results (either deposited in the filesystem or dumped to the console). What could be easier?

So I manually mounted the file I use as the qemu disk image, my plan being to manually create simple tests as a first step towards automation.

Hmmm... Everything looks very strange and unfamiliar. Ah, yes, this is upstart. No problem, the Internet is but a mouse-click away, right? But wait, the documentation says that upstart and the more-familiar (to old guys like myself, anyway) sysvinit can co-exist on the same system. And what about this new systemd thing that I keep hearing about?

So my tests are going to have to inspect the filesystem and figure out which of these three things is installed. And maybe someone installed one of them, then switched to the other, leaving traces of their earlier choice lying around. How is my script supposed to figure all that out? Of course, if it was just me running the tests, I could simply make sure that a pre-selected boot-up mechanism was installed. But that is silly — anyone anywhere should be able to test RCU. So I need to be able to deal with any of the three boot-up mechanisms. But wait, it is far worse than that: What is my script supposed to do when someone else comes up with a fourth, fifth, or sixth boot-up mechanism? Ouch!!!

It is clearly time to just say “no”! After all, my main test (rcutorture) is coded in the kernel, so it is clearly going to be far easier to translate my driver scripts to Linux kernel code than to code up a script that can outwit all present and future perpetrators of bootup mechanisms. In-kernel coding will allow me to fully control the test using kernel boot parameters, and thus happily ignore which of sysvinit, upstart, systemd, or whatever is installed. Some times you just have to get a horse!

Tags:

The -rcu tree is back on kernel.org

RCU
Thanks to the key-signing last week, -rcu is back on kernel.org, though at a slightly different address: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

Tags:

SequentialCaveman
The git archive for “Is Parallel Programming Hard, And, If So, What Can You Do About It?” is once again available from git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git. A big “thank you” to everyone who worked so hard to get kernel.org back!!!

In addition, Lu Yang, Xie Baoyou, and Yu Chen are working on another translation effort, with much of the book already translated to Chinese. A download is said to be available here, though my Chinese is not up to the task of figuring out exactly how to download it, even with the help of Google Translate.

I am hoping that this group and the group reported earlier will continue to make great progress on this!