Re-hash of N-version software versus Single version

From:         arch6@inlink.com (Archibald McKinlay)
Date:         14 Aug 95 03:57:25 
Organization: McKinlay & Associates
Followups:    1 2
Next article
View raw article
  or MIME structure


Safe Software: N-Version vs. Single Version
Arguments for, and against, N-version (N-Ver) over single version (SV) are
many faceted but use few, and overstressed, arguments.

The most stated reason FOR is that N-Ver overcomes a certain error set,
including specification writing and interpretation or hardware or compiler
error(s).  With no fully documented studies, N-Ver proponents claim that
since this method eliminates these errors they also feel justified to not
do certain error extraction methods common to single version software
development and test.

Likewise, N-Ver opponents offer experiment results (Knight, Leveson), vice
full industry studies but at least some public documentation, showing much
less than claimed reliability and fault avoidance results.  The residual
risk left untouched by N-Ver.

Despite the large, additional, costs there is documented to be less than a
30 percent increase in reliability (Eckerdt), still unproved by more than
a couple experiments, for a cost equal to (the number of versions) * 1.2,
which is greater than one due to the additional management and
coordination.
Argument: Certain Error Sets or Errors can be Eliminated from N-Ver and
Single-Ver differently, one better than another.
Lesson Learned:  N-Ver does deal with a subset of errors not commonly
dealt with in Single-Ver.  This error subset is produced at the software
requirement specification writing and transition, the software design
phase, the software implementation and the software test phase.  An error
subset in system test is also dealt with using N-Ver.

N-ver helps find discrepancies in the documentation and the assumptions to
the above phases as a function of testing the same functions, the same
way, on the separate versions.  NOTE: if the test engineer(s) test the
same functions but in a different part of the environment, that is not
using the same data set, there is as much chance of finding the
discrepancy in N-ver as there is in Single-Ver.  Likewise, if testing is
the only way to discover the discrepancy errors then it follows that the
system must be thoroughly tested to find these errors.  This exhaustive
testing becomes impossible in short order.

Exhaustive testing in Single-Ver, if possible, would not find the same
discrepancies because the interpretation or assumption is singular.  These
specification or interpretive errors would then only be found via reviews
and testing with or by the user.

So, with exhaustive testing impossible for N-Ver and Single-Ver alike in
most complex systems, and alternatives of reviews and testing available
and germane to sw development, there evolves less economic reason to
invest in one over another.  Remember the caveat that each N-ver version
must be tested using the same test data sets to uncover exercise the one
strength of N-Ver over Single-Ver.  This requires a good deal of discrete
oversight and coordination.
Argument: N-Ver explicitly finds differences betwixt implementations of
similar software requirement specifications
Caveat:  When each version within the N-Versions are tested using the same
data set these discrepancies can be found.  If each N-Version is not
tested <back to back>, as it is sometimes called, then the utility of this
advantage is completely wasted.
Argument:  N-Ver increases reliability of software subsystem
There are multiple mathematical exercises demonstrating that it logically
can be assumed that N-Verison programming will result in more reliable
solutions.  However, these are theoretical approaches.  The comparison to
realworld finds that a reasonable expectation for N-ver within a well
defined and process structurally similar development and test model will
result in approximately 1.3-1.5x increase in software reliability.
Caveat: To achieve these numbers the well defined and structural model
which the N-ver must be developed under is one similar to single version.
Modifying this model will modify the increase in reliability.  Hence,
eliminating certain error extraction processes because N-ver in testing
finds the discrepancies mentioned above is self-defeating in two senses.
First, the postponement of fault or error extraction until test delays the
error extraction, thus increasing the cost of the fix.  Second,
eliminating an error extraction process which is not directly and only
related to the unique error subset N-ver finds over Single-ver is wrong
because no other error extraction is substituted andexhaustive testing is
impossible.
Argument:  All outside faults, errors, failures all can effect software
Lesson Learned:  Faults injected prior to the writing of the software
requirement specification are no more probably found in N-ver or
Single-ver software using similar life cycles.  That is, faults in the
contract regarding processes or evidence, faults in system requirements
capture and writing, or faults in system design are usually unaffected by
N-ver, or Single-ver for that matter.  Multiple experiments and real
industry studies have shown that these prior phases account for the bulk
of errors propagated into and from within the software.  A very small
percentage of these errors are also within the special N-ver subset which
may be removed via back to back testing.
Argument:  N-Version costs at least N.3*$SV
The costs in N-ver are not just those in supporting N times as many
developers or suppliers and their development environments.  Costs also
incurred include coordination and travel amongst developers, the multiple
overhead charges duplicated for each developer, and the additional
coordination of differing sets of test plans yet using the same test data
without disclosing the design aspects to the other developers.  That is,
each developer will have and require a separate overhead charge.  Each
developer will require face-to-face meetings and therefore travel costs
are N times as much.  Management of tests and test results is very
sensitive in that discrepancies must be investigated without reference to
another version.  One ends up with each developer pointing fingers at the
other for the root cause.  In resolving these problems one cannot, as
acquirer, disclose the one design to another developer and vice versa.
Also, as N-ver relies on testing then the amount of time in testing should
be longer than Single-ver, further exacerbated by the sensitivity in
resolving discrepancies amongst versions.
Argument:  N-Version experiments show N-Ver still is susceptible to common
human error (never mind tools, which are also susceptible to human error)
and also do not fail independently.
Human error in in all software, regardless of whether one is looking at
the deliverable software, the in-house tools, and the off the shelf
tools.  This includes compilers.  Caution must also be taken in compilers
to use languages which do not involve the same company as they do not
follow N-ver independence and their compiler for Ada will involve faults
common to their C compiler.
Argument:  experiments were small enough and used lower experience
N-versionites have attacked the experiments from Knight and Leveson saying
that the experiments used only small programs and relatively inexperienced
programmers, typically students.  While statistical signifigance is a
factor, the commonality of errors and the difference in error sets was the
important point also.  There are however other studies using industry
level and experienced programmers which show a corrollary effect to that
human error witnessed in the Knight/Leveson experiments.  That is, each
group of programmers varied only a little in their error <dialect> while
all were observed to make a core set of errors repeatedly.  This means
that N-ver proponents, hoping to achieve independence in the error
dialect, must select and nuture the differences between suppliers, to the
point of avoiding same-university educated programmers in any two
suppliers.
Argument:  Complex Byzantine Algorithm and Voter Logic required
The byzantine argument is actually not totally addressed in N-version for
two reasons.  Whilel it can be made independent through developers, the
system, usually an aircraft, cannot due to weight and space considerations
make all sensors independent.  This and power sources complicates the
assertion of independence beyond human capability to mathematically
describe, and therefore prove.
Voter logic is also a point of contention.  If teh voter also uses the
same set of inputs or a similar specifcation, the argument of independence
is further, and more strongly than before, undermined.  A voter using the
same inputs is just another version and if it decides amongst the other
versions it nullifies the independence.  A voter unable to use independent
inputs is not prescient enough to ascertain which of the versions is then
correct.  A further byzantine.
Lastly in this argument is the added complexity of all these independent
inputs and these versions and the voter.  It has been affirmed to the
point we need not quote anyone that increasing complexity increases risk
and decreases reliability.  This is the anathema of N-version, too much
complexity and the reliability gain in software is negative at the system
level due to the resultant system level complexity driving down system
level reliablility.  The highest failure rates at the system level are in
fact the sensors.  This means big costs for software to gain reliability
is directly at odds with simplifying the system for increased reliability.
Software engineers will have to prove strongly that this increased system
complexity, system level maintenance and spares requirements, are
justified.
Argument:  SW Reliability
The following is based uon my understanding of software reliability from
M. Lyu, JPL, and B. Littlewood, U upon Tyne, Center for Sw Reliability.

IMHO, Software reliability has not advanced anywhere near to the point of
repeatability and surety that hardware reliabilty now enjoys.
Measurements of software development and test often involve immeasurable
parameters or are so invasive as to disrupt some development phases.  This
is strongly put but from an engineering standpoint, we haven't the tools
to measure the software nor the gumption to measure the human, even if we
knew what human to measure how.  Using many variables and following
incremental versions, as a small percentage of the whole both functionally
and in line of code numbers, it is now possible to estimate reliability
growth.  This is all that I have seen fully developed and used at industry
levels.  Again, on software which undergoes little change in controlled
processes, using little or no tool change, including compiler version
update, with the same people on the project.  If any of these gross
variables change then the reliability growth model may well change.  For
example, Dr. Lyu's paper regarding JPL software and choosing a reliability
model demonstrated that a serious software house with defined and
repeatable processes, a corporate tool set, relatively fixed personnel of
good quality, consistent application domain and type, even using very
similar languages, could not count on using the same reliability model
from project to project.  Other studies demonstrate the same while most
seem to make a particular model fit all cases by increasing variable or
parameter type or count.

This said, and I expect several retorts to that, can we also glibly say
that reliability measure of an N-version software versus a Single-version
software is a valid comparison?  What if the N-version is on a
communication network while the single-version is in a fighter aircraft?
Or the N-ver was in a commercial airliner while the single version was
windows 95?

- --
Truth arises from disagreement amongst friends, D. Hume (Scotland)
       eine Flucht nach Vorn machen, make a retreat forward
Loved and Missed, so Work Together and Rejoice, Phillipians 4:1-13
Archibald McKinlay, VI    Booz
Software Safety Engineering and Management           arch6@inlink.com