You are viewing paulmck

Previous Entry | Next Entry

Validation: When Fail-Safes Aren't Safe

inside
Fail-safes are a critically important part of our modern society. In fact, they play an indispensable role in permitting non-technical people to make good use of increasingly complex and pervasive technologies. Without fail-safes, the ordinary people in this world would encounter disaster at every turn.

Consider for example the following list of fail-safes:


  1. In order to prevent naive users from destroying their systems, make hard drives difficult to remove. This simple fail-safe should prevent any number of problems involving bent pins, hard drives being inserted into the wrong system, and so on.
  2. One can easily imagine a user hibernating the system, booting up into BIOS, then changing some BIOS setting that invalidates the hibernation image, thus preventing the system from ever resuming. In order to prevent this problem, the system should set a flag when hibernating that causes the BIOS to refuse entry. This flag can be cleared upon resume.
  3. In order to (1) speed up the booting process and (2) prevent someone from injecting a virus into the system via a surreptitiously inserted USB stick, CD-ROM, or DVD-ROM, ensure that BIOS checks for a bootable hard drive before checking USB or CD/DVD-ROM.
  4. If the user tries to shut down in the midst of an installation or upgrade, hibernate instead. This allows the installation or upgrade to carry on from where it left off without the need for a set of complex and error-prone boot-time checks for installation/upgrade progress.


What could possibly go wrong?

Tags:

Comments

tara_li
Jan. 1st, 2011 05:43 am (UTC)
One should be very careful in determining the "safe" way to fail, when designing a fail-safe. A lot of people seem to think "fail-safe" is easy to design. Who knows? One day this century, maybe they'll learn. I doubt it.
paulmck
Jan. 1st, 2011 11:18 am (UTC)
Fail-safes are indeed hard to design
I agree that fail-safes are more difficult than is programming in the assumed absence of failure: more difficult to design, more difficult to validate, more difficult to debug. And I don't expect most humans to ever learn this, any more than I expect most humans to learn how to program even in the absence of failure.

That said, I do expect that experts will learn how to construct effective fail-safes for specific situations, such as for installing and upgrading an operating system. However, it will take some time, and there will be plenty of painful and entertaining mistakes along the way. Just as has been the case for fail-safes in other fields of human endeavor. ;&ndash)