Failure of controls...Spanair crash caused by a Trojan

Published: 2010-08-22
Last Updated: 2010-08-22 01:01:41 UTC
by Rick Wanner (Version: 1)
10 comment(s)

Several readers have pointed us to an article about the preliminary report of the Spanair flight that crashed on takeoff in 2008 killing 154. The article suggests that a Trojan infected a Spanair computer and this prevented the detection of a number of technical issues with the airplane. The article speculates that if these issues had been detected the plane would not have been permitted to attempt take off.

There is still a lot that is conjecture and unknowns at this point in the investigation and I will try not to add to the speculation, but it made me think about the parallels to information security.
In information security we often speak of controls. There are three types of controls; preventive, detective, and corrective. Predominantly in information security we deal with preventive and detective controls.

Preventive Controls aim at preventing issues before they occur. Some examples of preventive controls are policies, standards of operation, procedures, checklists, segregation of duties and change controls. From an IT technology point of view firewalls and intrusion prevention systems are popular technological preventive controls. The airline industry also has procedural and technological controls. Airlines have operating protocols covering most aspects of operations from when it is safe to fly to how to maintain the equipment. Pilots have pre-flight and in-flight checklists to ensure safe operation of the aircraft. Modern airliners have similar interlocks and safety systems to attempt to protect the aircraft from mechanical failure or human error.

Detective controls aim to detect an issue when it does occur, or as soon as possible after. In the words of Dr. Eric Cole, a notable SANS instructor, “Prevention is ideal, but detection is a must!” If at all possible we would like to prevent the event from occurring, but if we can’t prevent the event we want to know it happened so we can adequately respond. The obvious IT detective controls are host and network based intrusion detection systems (IDS). But less technological processes such as audits are also a detective control aimed to detect and correct anomalies before they become more serious. Modern airliners also have detective systems to detect events before they are service affecting. One quote from the article, indicates a failure in a detective control occurred ... “The plane took off with flaps and slats retracted, something that should in any case have ... triggered an internal warning on the plane.”

I am not a pilot, so I cannot speak with authority on how to fly a passenger airliner, but it seems clear to me that this accident was caused by the failure of a number of controls leading to a disastrous outcome. Clearly the SpanAir diagnostic system (a detective control) designed to detect anomalies in the airliners system failed, possibly due to a Trojan. Also it appears the pilots bypassed part of their pre-takeoff checklist, leaving the flaps and slats in a position not recommended for takeoff. As ISC reader Frank pointed out that is most likely because the pilots had aborted the initial attempt to takeoff and most likely resumed the pre-takeoff checklist (a preventive control) too low in the checklist and missed a significant step. It is also clear that for some reason an internal system (a detective control) that should have detected the misconfigured flaps and slats for some reason did not alert the pilots to this condition.

In information security, the stakes are rarely so high as human lives, but failures in controls often lead to unexpected consequences. A misconfigured firewall rule allowing more permissive access to systems, a false negative in an IDS/IPS system, a user violating policy by plugging in a personal USB stick etc. The moral of the story is don’t take your control systems and processes for granted. Audit and test them regularly to ensure they are operating correctly.
 

 

-- Rick Wanner - rwanner at isc dot sans dot org - http://rwanner.blogspot.com/

Keywords: controls crash Trojan
10 comment(s)

Comments

If you don't want to add to the speculation, the heading 'chrash caused by trojan' doesn't really seem like a good choice, does it?
Comments:

1. there are many interests at stake in this unfortunate accident: professional pilots, technical staff mechanic, airline, rescue operations managers, managers of AENA (government) and ... the worst ... politicians.

2. the computer has been infected say is that incidents are recorded, no actual aircraft equipment. Is the flaps were not deployed at takeoff but it has nothing to do with the virus (if that is true about the infection and is that is true the flaps were not deployed).

3. Has anyone thought that a company has all its ticketing system in an unprotected pc and without security measures?. Please be serious (SANS has always been serious, yes?). I think that someone is interesting to wring the lump and to blame to the computer science (or to the computer engineers) of this russet.

4. much is at stake here, and some want to divert more of the account. The accident investigation report says a different story than the article leads us to believe and still there is no final report. Do not bite the hook and let out the final conclusions. Thanks Rick.


Luk.
Where I come from we say the wise is pointing to the moon, but the feeble is looking at the finger.

And so I shall be...

Has anyone else noticed that a computer onboard an aircraft was possibly infected with malware?

No matter how critical (or not) the system, if any of the aircraft processes was dependent on the system's security state, then it should have been adequately protected.

If the story turns out to be true, isn't this where the controls have ultimately failed?
I wouldn't be too surprised if someone forgot to consider malware infection as a possibility. History is full of stories about "closed" systems that were so secure they "didn't need" internal safeguards.

But what struck me as odd, was the claim that power to a seemingly critical alarm could fail like that, without raising a red flag anywhere.
That's also what makes me a bit sceptical to the contents of the story.

There's a lot more to this than what that story is telling.
Prontissimo: there was never any mention of an onboard computer being infected.
Other newsstories refer to details from the original spanish articles that also specify that it wasn't. The MSNBC story hints at it ("central computer system").

The plane in question wouldn't have this kind of computer on-board anyway. The MD-80 is a "good, old-fashioned" aircraft, not a modern hi-tech, digital toy.

If you google "flight 5022 crash", there are more articles that have popped up which make a lot more sense of the whole thing.
And if they are correct, then I think it's safe to say that in no way did a trojan crash this plane.
I worked on jet avionics and electrical systems for a quarter century. There are so many systems in place to prevent something like this it's inconceivable that it still happens. Low-tech things like loud horns and warning lights when the power levers are advanced past a certain percentage while the weight-on-wheels switches show the aircraft is on the ground and not in a takeoff configuration. Even on the newer aircraft the computerized systems still have the mechanical low-tech backups. Some of the jets I worked on had mechanical interlocks so you simply couldn't pull a bonehead move like this.

These are closed systems like you've probably never seen before. No Windows, no Linux, no Unix. Almost every piece of equipment uses its own custom processor and language especially suited to its function. All software updates must have FAA approval (in the US) before it can be deployed. Certain systems, like traffic conflict and avoidance warning, are only allowed to run the same specific version on all vendors to assure interoperability. And most systems of this vintage are not updatable unless you remove and replace ROMs.

This one all boils down to inadequate training and a lack of professional behavior. They had to have had ample indications that certain systems were not working, they didn't follow the checklists and they didn't abort when they failed to reach certain speeds at certain points during the takeoff roll.

I don't care what kind of ground-based controls could have alerted them. Just like in the information security field, you can't use technology to fix stupid.
Just as a point of clarification, the news articles out there do not say there was any malware on any system on the aircraft. What was infected was a server back in the data center that tracked all the maintenance issues reported by pilots and notified the maintenance department if there was something that needed to be fixed on an aircraft. The flap warning system had some intermittent issues over the prior week and the infected system did not notify anyone. As with 99% of all aviation accidents, it is very rare that one thing caused the accident but it is usually a chain of events that lead to the crash. If any link in that chain had been detected the accident would not have happened – defense in depth. If the infected server was functioning properly, the warning system would have been fixed and the pilots notified of the incorrect flap setting – accident avoided.
Doubtful. When I was doing accident investigation many, many years ago the rule of thumb was that it took five small incidents to turn into an accident, the point you made. However these folks ignored their company procedures and career-long training for running checklists and they knew or should have known there were other issues with the warning systems. Configuration warning systems are tested, or are supposed to be tested, on each preflight. Even the abnormal acceleration rates they must have experienced didn't clue them in. To think that an ACARS (datalink) message issued maybe a minute before beginning their takeoff roll would have made a difference just doesn't jibe with my experience. The cockpit voice recorder probably would have had them saying "You checked the flaps, right?" and the copilot responding "You heard me when we did the checklist." even if they paid attention to it.
failure of controls or even bad software design without proper controls has affected humans just the way Therac-25 did in 1985.
http://en.wikipedia.org/wiki/Therac-25
Therac to me seems like a trojan in an onboard system which as yet this is not. It seems just as likely the computer has an old Pentium 60 on it, you know, the one with the floating-point bug.

Probably just as likely is the pilot pulling the P40 circuit breaker to silence the take-off config warning during single engine taxi to the threshold and for whatever reason forgot to put it back. There must of been something that kept the pilots from getting the flaps down during the the pre take-off checklists. To me it sounds more eerily like a possible rerun of NW 255.

Diary Archives