11 Infamous Software Bugs

Page 4 of 5

Grievous Bodily Bugs

Not all bugs can be laughed off. Some of them are fatal. Medical and military software can be especially dangerous when not properly tested, as shown with these fatal flaws.

Patriot Missile Mistiming

During the first Persian Gulf war, Iraqi-fired Scud missiles were the most threatening airborne enemies to U.S. troops. Once one of these speeding death rockets launched, the U.S.'s best defense was to intercept it with an antiballistic Patriot missile. The Patriot worked a bit like a shotgun, getting within range of an oncoming missile before blasting out a cloud of 1,000 pellets to detonate its warhead.

A Patriot needed to deploy its pellets between 5 and 10 meters from an oncoming missile for the best results. This requires split-second timing, which is always tricky with two objects moving very fast toward each other. Even the Patriot's most prominent booster, then-President George H.W. Bush, conceded that one Scud (out of 42 fired) got past the Patriot. The single failure the president acknowledged was at a U.S. base in Dhahran, Saudi Arabia, on Feb. 25, 1991, and it cost 28 soldiers their lives. The fault was traced to a software error.

The Patriot's trajectory calculations revolved around the timing of radar pulses, and they had to be modified to deal with the high speed of modern missiles. A subroutine was introduced to convert clock time more accurately into floating-point figures for calculation. It was a neat kludge, but the programmers did not put the call to the subroutine everywhere it was needed. High-speed trajectories based on one accurately timed radar pulse and one less-precise time increased the chances of poorly timed deployment.

Apparently, the issue was known, and a temporary fix was in place: Reboot the system every so often to reset the clocks. Unfortunately, the term "every so often" wasn't defined, and that was the problem in late February at Dhahran. The system had been running for 100 hours, and the clocks were off by about a third of a second. A Scud travels half a kilometer in that time, so there was no chance the Patriot could have intercepted it.

On a side note, some experts did dispute the president's claims of a more than 97% success rate for Patriots vs. Scuds, so it's possible that this bug caused more (but less high-profile) damage than the incident at Dhahran.

Therac-25 Medical Accelerator Disaster

Radiation therapy is a handy tool in the fight against some contained forms of cancer: Beams of electrons zap the bad stuff, and the body disposes of the dead matter. It has a strong success rate, but it depends on accurate aim and focus. That's something that the medical world leaves to machinery. Unfortunately for six patients between 1985 and 1986, the Therac-25 was the machine in question.

The Therac-25 handled two types of therapy: a low-powered direct electron beam and a megavolt X-ray mode, which required shielding and filters and an ion chamber to keep the dangerous beams safely on target. The trouble was that the software that powered the unit was repurposed from the previous model, and it wasn't adequately tested.

If the operators changed the mode of the device too quickly, a race condition occurred: Two sets of instructions were sent, and the first one to arrive set the mode. In six documented cases, this meant that megavolt X-rays were sent, unfiltered and unshielded, toward patients requiring direct electron therapy. At least two of them screamed in pain and tried to run from the room. All of them suffered radiation poisoning, which claimed several lives.

The Therac-25, which was recalled in 1987, has become an object lesson in what can go wrong with powerful medical machinery. The code didn't cause overdoses in earlier Therac models because hardware constraints prevented them. Reusing code on a new system without thorough testing is a programming no-no, with good reason.

The new system did deliver error messages during race-condition events, but the codes were cryptic, undocumented and easily overridden -- which is what operators did. With adequate documentation and training, the overdoses would never have happened. Additionally, a smaller bug that set up flag variables occasionally caused arithmetic overflows that bypassed safety checks.

Multidata Systems/Cobalt-60 Overdoses

Unfortunately, the Therac-25 disaster wasn't the last software-related radiation therapy failure. Fifteen years after the Therac-25 incident, a Cobalt-60 machine in Panama's National Cancer Institute overdosed more than two-dozen patients with gamma radiation.

As with the Therac-25, the Cobalt-60 system was an accident waiting to happen. Unlike the Therac-25, the Cobalt-60 was an old, overused and undermaintained piece of hardware. The software that ran it was an aftermarket program from Multidata Systems, because the Panamanian hospital could not afford what the machine's manufacturer, Theratronics, charged.

Two of the technicians who operated the Cobalt-60 had quit, leaving the rest to work 16-hour days to keep up with treatments. Very sick patients would sometimes wait four to six hours a day for scheduled treatments.

Overworked and tired technicians requested some software maintenance, but management overlooked their requests. Somewhere along the line, the technicians hit upon a more efficient way to line up the shields that defined the radiation's target. It wasn't in the manual, but it seemed to work. Unfortunately, if you lined up the shields in a particular order, an obscure bug in the Multidata software meant that the patients were overirradiated. Because of massive overwork and undersupervision, the process went on for seven months.

By the time Multidata Systems issued an advisory about a "data entry sequence that creates a self-intersecting shape outline" in mid-2001, it was too late for many patients. The exact death toll is hard to calculate -- these were very sick patients even before their treatment -- but it's a tragic mess-up by any measure.

Osprey Aircraft Crash

Two weeks before Christmas in 2000, a U.S. Marine Corps Osprey, a hybrid airplane and helicopter, suffered a hydraulic system fault that should have been remedied without loss of life. A hydraulic line broke in one of the two engine cases as the Osprey was shifting from airplane to helicopter mode for landing.

According to the Marine Corps major general who presented reports during the investigation of the incident, the trouble was "compounded by a computer software anomaly." The flight-control computer stopped the rotation of the engine pods when it detected the hydraulic failure.

The pilots went through the normal procedure and pressed the primary reset button to re-engage the pods. At this point, both prop rotors went through "significant pitch and thrust changes," which led to a stall. The plane crashed into a marsh and killed all four Marines onboard.

The nature of the software flaw is still hard to track down: Boeing and Bell Helicopter made the Osprey, and Boeing's spokesman said only that changes were made in the software. Requests for details were referred to the government, and as of now, the explanation has not been forthcoming.

| 1 2 3 4 5 Page 4
Shop Tech Products at Amazon