What Makes Software Safe? (was Concord Loses #3)

From:         bareynol@cca.rockwell.com (Brian A. Reynolds)
Organization: Rockwell Avionics - Collins, Cedar Rapids, IA
Date:         27 Jun 95 01:43:12 
References:   1 2
Followups:    1 2 3
Next article
View raw article
  or MIME structure


>The following was posted to rec.travel.air.  Is it true?

>>Unlike the Airbus aircraft the 777s fly by wire system uses only one
>>implementation of the software, therefore if there is a software bug
>>it will not be detected by another computer.

>A very simplistic assessment.  This is potentially true, however, is it any
>less true where different, i.e., multiversion software is used?Worldwide
>software and hardware "experts" disagree on this issue.European airworthiness
>authorities (JAA, DGAC) appear to favormultiversion software
>implementations but the FAA gives no safety credituse of multiversion >software.  Academic studies seem to indicate that common mode errors are just
>as likely to occur in multiversion software asredundant systems which use the
>same software.

A brief Tutorial on Safe Software

>From a safety analysis point of view, one must define what is trying to be
protected against.  Consider a typical software development process:

	REQUIREMENTS -> SPECIFICATION -> CODE -> COMPLIER -> EXECUTABLE CODE

	EXECUTABLE CODE -> MACHINE -> FUNCTION IMPLEMENTATION

Any mistake/error could only be 'tested out' and not protected against during
use.

In n-version programs this would be:

REQUIREMENTS 	-> SPECIFICATION -> CODE -> COMPLIER -> EXECUTABLE CODE
	EXECUTABLE CODE -> MACHINE(1) -> FUNCTION IMPLEMENTATION(1)
			-> MACHINE(2) -> FUNCTION IMPLEMENTATION(2)
			-> MACHINE(3) -> FUNCTION IMPLEMENTATION(3)
WHERE FUNCTION IMPLEMENTATION(1)
	== FUNCTION IMPLEMENTATION(2)
	== FUNCTION IMPLEMENTATION(3)

That is, there are n-versions of the same code running on different computers.
Typically there will be a voting scheme thrown in so that the 'odd machine'
will be forced out of the loop.  Note however what is common with this scheme:

	a)requirements	b)specification
	c)code		d)compiler
	and finaly 	e) the executable code.

What is being protected against?  A failure (not design error) within one
of the hardware platforms.  Such as a 'flaky' transistor which goes bad in
the worst way at the worst time.  What happens if any part of the common
'stuff' is in error?  Well, you have either a latent fault (one waiting to
happen - if it waits long enough it's called a bug :), or one that is more
immediately noticable and the system will behave 'in an anomolous manner'
(to use the venacular of DO-178B).

But consider n-version dissimilar software:

REQUIREMENTS->	SPECIFICATION(1)-> CODE(1)-> COMPLIER(1)-> EXECUTABLE CODE(1)
            |	EXECUTABLE CODE(1) -> MACHINE(1) -> FUNCTION IMPLEMENTATION(1)
	    |
	    ->	SPECIFICATION(2)-> CODE(2)-> COMPLIER(2)-> EXECUTABLE CODE(2)
		EXECUTABLE CODE(2) -> MACHINE(2) -> FUNCTION IMPLEMENTATION(2)
	    |
	    ->	SPECIFICATION(3)-> CODE(3)-> COMPLIER(3)-> EXECUTABLE CODE(3)
		EXECUTABLE CODE(3) -> MACHINE(3) -> FUNCTION IMPLEMENTATION(3)

WHERE FUNCTION IMPLEMENTATION(1)
	== FUNCTION IMPLEMENTATION(2)
	== FUNCTION IMPLEMENTATION(3)

This implementaion says that a failure (sorry anomolous behavior) in any element
of any implemention could be detected.  This eleminates 'common mode' errors
of specification translation, complier errors, machine design errors (Intel
taught us this!) and so forth.  If n-version dissimilar software is so good
(which it is) why isn't it used more often?  Economics.  The above scheme
literally tripiles the cost of a program over it's entire life cycle.

In some instances a system architecture is chosen which uses two identical machines (making that pair n-version similar software) and one different
machine (making the system n-version dissimilar software) for economic
reasons.  But if the dissimilar machine fails (breaks for example), then
the system defaults to n-version similar with the latent faults typical of
this architecture!

>Dr. Nancy Levison, University of Washington Professor has studied this issue
>extensively.

Having set accross the tble from Dr. Levison during the development of DO-178B
(a document which the FAA looks favorably on as to how software should be
developed), she would (and did!) irrevocably argue that even n-version,
dissimiliar implementation is not demonstrably safe because of common errors
built into the top level requirements.  She argues that we, as humans, will
interrupt requirements in a similar way, leading to common misunderstanding
which get formalized in dissimilar specifications.  This would lead to correct
software, but an incorrect implementation of the function.

Some food for thought next time you fly :)

If folks are interested in starting a thread on this, perhaps a more appropriate
group would be sci.engr.safety (Moderator do you agree?)

Brian

******
The views expressed are my own.  I can't speak for the Company.  (I feel
silly each time that I have to say this.  It would seem so obvious that I'm
not a Compnay spokesperson.  Why should I have to keep saying it? Oh well,
OJ might get off and pigs can fly!)
******