The Craft of Fixing Bugs

DebuggingEveryone is at sometime tasks with troubleshooting a piece of technology. Whether it be a water heater or weather satellites, there is are some basic rules to figuring out what gremlin has stop the @#$%&*! thing from working properly.

From Kevin Kelly's Cool Tools, here is a terrific aid to "essential technological literacy."

These days debugging is an necessary life skill. Anything high tech has more ways of failing than running. Since failure hides in complexity, you need to be systematic to fix a break in a system. But debugging skills are not taught anywhere.

This book teaches you how to troubleshoot. It is meant for engineers debugging computer programs, but the principles of debugging can easily be applied to any engineered system -- your car, home plumbing, a new gizmo, old laptop, hi-fi system, or anything with many dynamic parts.

The book is easy, with lots of war stories. I learned a lot. Lately I've become the defacto system administrator for the network of seven computers in our household, and these principles have upped my success rate in clearing up the inevitable problems.

What you get: essential technological literacy.

-- KK


Take a look at the Rules (http://www.debuggingrules.com/) and ask yourself how many of them were violated by FEMA during the Katrina debacle. Then ask yourself, "How many am I violating right now?!?"

Also helpful for spousal disputes.

Debugging: The Nine Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
David J. Agans
2002, 192 pages
$15
Available from Amazon
http://www.amazon.com/exec/obidos/ASIN/0814471684/ref=nosim/kkorg-20

Sample excerpts:

The Rules - Download Poster Suitable for Framing
Understand the system
Make it fail
Quit thinking and look
Divide and conquer
Change one thing at a time
Keep an audit trail
Check the plug
Get a fresh view
If you didn't fix it, it ain't fixed

Change One Thing at a Time
On nuclear-powered subs, there's a brass bar in front of the control panel for the power plant. When status alarms begin to go off, the engineers are trained to grab the brass bar with both hands and hold on until they've looked at all the dials and indicators, and understand exactly what's going on in the system. What this does is help them overcome the temptation to start "fixing" things, throwing switches and opening valves. These quick fixes confuse the automatic recovery systems, bury the original fault beneath an onslaught of new conditions, and may cause a real, major disasters. It's more effective to remember to do something ("Grab the bar!") than to remember not to do something ("Don't touch that dial!") So, grab the bar!

Understand the System
You need a working knowledge of what the system is supposed to do, how it's designed, and, in some cases, why it was designed that way. If you don't understand some part of the system, that always seems to be where the problem is. (This is not just Murphy's Law; if you don't understand it when you design it, you're more likely to mess up.)

Make It Fail
So you can tell if you've fixed it. Once you think you've fixed the problem, having a surefire way to make it fail gives you a surefire test of whether you fixed it. If without the fix it fails 100 percent of the time when you do X, and with the fix it fails zero times when you do X, you know you've really fixed the bug.
If You Didn't Fix It, It Ain't Fixed
When you think you've fixed an engineering design, take the fix out. Make sure it's broken again. Put the fix back in. Make sure it's fixed again. Until you've cycled from fixed to broken and back to fixed again, changing only the intended fix, you haven't proved that you fixed it.

Ask for help
There are at least three reasons to ask for help, not counting the desire to dump the whole problem into someone else's lap: a fresh view, expertise, and experience. And people are usually willing to help because it gives them a chance to demonstrate how clever they are.

No matter what kind of help you bring in, when you describe the problem, keep one thing in mind: Report symptoms, not theories. The reason you went to someone else for fresh insight is that your theories aren't getting you anywhere. If you go to someone fresh and lay a theory on her, you drag her right down into the same rut you're in. At the same time, you've probably hidden some key details she needs to know, because your bias says they're not important. So be firm about this. When you ask for help, describe what happened. Describe what you've seen. Describe conditions if you can. Make sure you tell her what's intermittent and what isn't. But don't talk about what you think it the cause of the problem.

Though the terms are often interchanged, there's a difference between debugging and troubleshooting, and there's a difference between this debugging book and the hundreds of troubleshooting guides available today. Debugging usually means figuring out why a design doesn't work as planned. Troubleshooting usually means figuring out what's broken in a particular copy of a product when the product's design is known to be good--there's a deleted file, a broken wire, or a bad part. Software engineers debug; car mechanics troubleshoot. Car designers debug (in an ideal world). Doctors troubleshoot the human body--they never got a chance to debug it. (It took God one day to design, prototype, and release the product; talk about schedule pressure! I can we can forgive priority-two bugs like bunions and mail pattern baldness.)

The techniques in this book apply to both debugging and troubleshooting. These techniques don't care how the program got in there; they just tell you how to find it. So they work whether the problem is a broken design or a broken part. Toubleshooting books, on the other hand, work only a broken part.