Prevention

Detection

Reliability

Embedded Systems Quality

Requirements Elicitation

Configuration Management

Capability Maturity Model

Conclusions

References

 

3. Defect Detection

If an adequate prevention programme has been implemented, the number of errors being inserted into the code will be dramatically reduced. However, this is reduced, not eliminated. No matter how good the methods used, there will always be a certain amount of errors inserted (Dunn & Ullman, 1994, suggest as few as 5 per thousand lines of code). For that reason, it is important to follow the prevention programme with a detection scheme, to pick up the errors before they become defects in later stages of the design lifecycle.  

3.1. Phase containment

If these errors are not contained to the stage that they were inserted, they will propagate and generate further detects in the later stages (Dunn, Ullman, 1994). The cost of fixing defects at these later stages is considerably greater than finding and eliminating them early on. Futrell et al. (2002) estimates that defects can cost up to 100 times to fix as those eliminated in the early stages. The phase containment effectiveness equation gives developers a method to evaluate the success of early defect detection and removal: 

PCEi = Ei  / (Ei + Di

where:

Ei  is the number of errors found in the phase i

Di  is the number of errors not found in phase i, but removed in phase i + n 

If PCE is very close to 1, then containment is proving successful. If PCE is closer to zero, then effort needs to be focused on improved early defect detection mechanisms. 

3.2. Code reviews

A widely used method of defect detection is code reviewing. This practice was originally developed at I.B.M. in the 1970s, and was known as ‘The Cleanroom Approach’. It involves a team of people who are familiar with the project but not necessarily a part of the code development process. This team (usually three or four people) goes through the code, line by line, and determines its correctness. The size of the code segment (perhaps a function or a module) is kept small. Stavely (1999) suggests that no more than two pages of code be reviewed an hour. Futrell et al. (2002), gives a different take on this, as shown in figure 3. However, the source of this data has not been revealed, so its validity must be questioned. 

Figure 3. Finding the optimal code size to review 

 The optimum lines of code reviewed per hour was determined to be 125 by Buck (1981). This study, commissioned by I.B.M., would have been working with traditional 2nd and 3rd generation languages. Therefore, this figure may not be applicable when looking at more descriptive, ‘user friendly’ languages that are common today. 

Finding defects are not the only reasons to do reviews. Looking for ways to improve the code is also important. If it can be made less complex and more reusable, then this will improve its quality. Whist reviews are usually a formal event, code walkthroughs are less so, and are more widely used in the industry. The lead programmer is given the responsibility to organise the walkthrough (this is not the case in reviews), and a less stringent evaluation process is used. Whichever method is chosen, the number of errors removed at the stage of insertion will be a high percentage of the total amount of errors in the code. Futrell et al. (2002) suggests that 60% – 90% of errors will be picked up in this fashion, as opposed to only 25% in an automated testing situation (unit test).

3.3. Proving correctness

As program code is a list of operations that a computer must perform, these operations can be represented mathematically. Therefore, if the specification for a module is known, and the code is available to evaluate, then in theory, the correctness of that code can be proven. There are two main types of proofs, but none are used extensively in the software industry. Dunn and Ullman (1994) propose the main reason for this as being the tediousness and error prone nature of the proofs. The question of ‘are the requirements correct?’ does not get answered by using mathematical proofs of correctness, therefore limiting their usefulness.  

The first method is called the functional proof. The specification for the module is developed from the requirements and a mathematical function is derived to express that module. The mathematical function is then compared with the actual code to see if the output is the same. If it is not, then there is an error either in the code, or in the expression. This is one major drawback of mathematical proofs, how can we be sure the expressions derived to check the code are correct in the first place? 

The second approach, named the algebraic proof, examines each line of the code and converts it into a set of algebraic expressions. The validity of these assertions is then proven by use of reasoning and algebraic manipulation. This approach is very labour intensive and not particularly conducive to automation. 

This side of defect detection seems to be underused and underdeveloped. If suitable tools can be generated to automatically analyse code, for more than just syntactical errors as we see today, then the quality of software may increase as more errors / defects can be found.  

3.4. Testing

Whereas code reviews and proof of correctness are mostly manual tasks, looking at the individual parts of the code, testing is more automated. It passes inputs into the system and examines the output for correctness. In a similar way that error prevention does not stop all errors from being inserted into the code, detection schemes such as reviews cannot pick up every last error. Testing is still an integral part of a software development process. A poor testing process can lead to latent defects in the final product, thus damaging the quality of the product and the reputation of the developer. 

There are two classes of tests; static and dynamic. Static tests examine the code and look for problems without running the actual code on test cases. This will pick up syntactic errors, as well as incorrect use of data types, improper structure and perhaps involve a measure of complexity. Dynamic testing on the other hand runs the code through test cases to determine if the output is correct. This section will briefly cover some of the main methods of testing. 

3.4.1. Black box

This test takes a module of code, passes a series of input to that module, and examines the result. Nothing is known about the actual code itself. If the results of the test are in accordance with the external specifications for that module, then that iteration of the test is flagged as successful and a new set of inputs are passed into the module. Depending on the size and complexity of the module, it might not be feasible to analyse every possible input. If this is the case, then the test case must be designed carefully to pick up as many faults as possible. 

3.4.2. Glass box

Alternatively, the module under test can be viewed with knowledge of its structure. Test cases can be designed so that all or a majority of paths in the code are traversed. This leads to test cases that are much smaller than in black box testing to achieve the same level of defect detection. However, there still exists the possibility that a traversed path that is faulty will be verified as correct, because an insufficient range of variable data is used. 

This type of testing usually focuses attention on the internal specifications of the module, which have been previously defined from the external specifications. This is one downside of glass box only testing, as it does not help verify that the specifications are correct in the first place. Dunn & Ullman (1994) gives between 20% and 33% as an estimate on how many faults can be found using either black box or glass box testing. It is also said that both should be used as part of a defect detection strategy to maximise success. 

3.4.3 Seeding

It is useful to know when to stop testing. One approach to determining how much is enough is seeding. Faults are purposely inserted into the code. A test case will pick up a certain amount of these seeded faults. In its most basic form, the ratio between the discovered seeded faults and the total seeded faults gives an evaluation of the effectiveness of the testing process. Of course, the types of seeded faults have a great impact on the accuracy of this approach. These faults need to match the complexity of the real faults inherent in the code. 

3.4.4 Regression testing

Once faults have been identified and corrected, the exact same tests should be run on the corrected module, and all those linked to that module. This determines if the correction was successful, and also indicates if any new bugs have been introduced into the module. Configuration management plays an important role in maintaining what tests were done on what modules for exactly this purpose. 

3.4.5. Integration testing

After black box and glass box test procedures have been carried out on module sized section of code, these modules are coupled together where appropriate and tested in unison. The interface between modules is important here, with faults only coming to the surface when unexpected inputs are experienced into a module. Good testing at a module level can minimise these faults, as well as ensuring the interfaces are well defined.  

Once two or more modules have been linked together, they can be tested in a similar manner to those at the singular level. Multiple modules can be built up into subsystems, and subsystems integrated to form the final product.  

3.4.6. Alpha and beta testing

Once the quality of the system (lack of defects in this case) has been developed to a suitable level, it is placed into operation. Alpha testing puts the system into operation in-house. This might mean a group of developers use the program for a certain amount of time to uncover any problems that did not arise in previous lifecycle stages. In beta testing, users from the client try out the product. This is a good way to test if the specifications have been correctly developed from the requirements. However, the end users of the software need to be an integral part of the entire developmental lifecycle. If beta testing is the first time a user is involved, then it is probably too late. 

User acceptance and usability tests can fall under alpha and beta testing, although it is common to do both prior to this stage. Acceptance and usability testing involves getting end users to use the (near) final product. These users are then extensively monitored (with their permission) and questioned about their experience with the software. Factors such as body language are observed to determine independently the quality of the software. 

Previous | Home | Next