Prevention

Detection

Reliability

Embedded Systems Quality

Requirements Elicitation

Configuration Management

Capability Maturity Model

Conclusions

References

 

4. Reliability

Quality and reliability go hand in hand. Reliability, as loosely defined in software terms, is the ability of the software to perform its intended task. Most commonly, this is the stability of the system, i.e. the lack of crashes. If the software can gain a higher reliability level, then the quality will increase. Section 2 and 3 looked directly at error / fault prevention and detection. All of those methods will increase the reliability of the software. In this section, the concept of reliability models for fault forecasting, and the implementation of fault tolerance is examined.  

4.1. Fault prediction

By using a model to estimate the reliability of a module / system prior to testing, existing problems can be highlighted. It can also indicate how much the testing stage will be held up by the need for debugging.  

Section 2.10. described the issue of complexity, and how highly complex systems usually have a greater incidence of errors. One model used to calculate this complexity was the Cyclomatic Complexity Model. This can be used for fault prediction, with the simple equation: 

cyclomatic number = edges – nodes + 2 

The greater the value for the Cyclomatic number, the more likely the reliability of the system will be smaller. 

Halstead proposes another method for fault prediction. This is based on the following four variables (Dunn, 1994): 

  1. n1 =  number of unique operators in a program
  2. n2 = number of unique operands in a program
  3. N1 = total count of the use of all operators
  4. N2 = total count of the occurrences of all operands

The vocabulary of the program is then defined as: 

n = n1 + n2 

and the length defined as: 

N = N1 + N2 

with the volume defined as: 

V = Nlog2n 

These equations can be used to determine the size of the system. From this, the number of defects in the system can be estimated. 

Although these measures are based on knowledge of the code, they have not been proven to be particularly accurate. Care needs to be taken when using them, and should be used in conjunction with other information to determine the reliability of a system. 

4.2. Fault tolerance

When an input to the system is illogical, the system needs to cope with it accordingly. In the worst case scenario, the system will accept the incorrect argument. It will try and process it, unsuccessfully, then crash in an unrecoverable manner. A main concept in fault tolerance is data sanity. Checks need to be made on all inputs to make sure they conform to the specification. It is also useful to give an error message that makes some sense, rather than some obscure error code. 

If a software component does fail, graceful degradation can improve reliability. This means the entire system will not crash if a component fails. Some functionality might be lost, until system maintenance can be performed. Not an ideal situation, but better than having zero functionality until the technician can come along and work out what is wrong. 

Previous | Home | Next