Testing

Testing

Testing is going to be a very important task. There are two major subtasks:
  1. Testing the software
  2. Evaluating different modules.

Testing

Testing is straightforward. There is a lot of functionality, and a great deal of testing will be required to identify bugs. Testing can be divided into two parts: strategy and non-strategy.

Examples of non-strategy testing would be saving and loading a game file, and continuing play from that point, or translating messages into a new language, and viewing those messages throughout the client.

Strategy testing will take the bulk of effort, since this is where the vast majority of the effort will be taking place. There will be a test directory at the same level as src which will contain a parallel package hierarchy. Test cases will be written for the module type, not the module name. Thus, tests would be written for Shape, Life and Death, etc, rather than for specific implementations of these. The test cases will only work on the public defined methods of these module types. There are two types of test cases, those that are pass-fail, and those that give a graded response. The former is used by testing per se, and the latter for the evaluation task described below.

The basic method of testing is as follows; for each of a series of modes and game records, computer - computer play is initiated. After each move of the game, the test cases for each module are run. So, for example, the Groups module can be tested by getting the liberties for each group and comparing these against known results. A great deal of effort would go into both writing these test cases, and recording the correct answer for each test case for each move of all of the test games. In addition, the log file is monitored to determine if any errors or warnings occur for any of the modules.

Another type of testing would be running the program in computer - computer mode with the game saved after each move, and checking the log files for any errors which occur. In addition, the game records would be saved, and a person could manually go through the game records. If any of the moves appear to be below the programs usual skill level, then the game is restored to that move, and the debug windows are used to determine the reason for the poor move.

Evaulation

Not only does the software have to be tested for bugs, but multiple implementations of modules need to be evaluated. For each official release, there will be a ranking of modules for each standard mode. Currently, the standard modes are Best, meaning best possible play, and Fast, meaning reasonable play with smaller requirements for cpu and memory. There may be a Tournament standard mode added in the future.

In order to perform this evaluation, there would have to be a standard test suite for each module which would be run against all of the different implementations to determine this ranking for each mode. For example, for the life and death module, a series of life and death problems could be given to the modules, and their results and the time taken and the memory used to get them measured. This would not consist of pass-fail tests as for the testing task described above, but life and death problems of varying difficulty. Each Life and Death module would then be graded on a scale of how well it did in solving the problems, as opposed to failing the test if it got any problems wrong. Evaluation may also include playing different configurations of the program against itself. Of course, this evaluation would include things such as making sure that the implementations did not throw exceptions, enter infinite loops, etc.

Responsibility

Volunteers are needed to design and build this test software. Ideally, the organization of this test group would parallel that of the non-test part of the project, so there would be one or two people responsible for the test software as a whole who would design and build the overall test software framework, and there would be one person responsible for testing each module. So, one person would be responsible for the testing of the shape module. They would implement their code according to the framework designed by the overall test coordinator so that when the test framework runs, the test code for the shape module would be run as one part within that framework.

Obviously, this is a very large task in itself, which includes designing and implementing a code framework, as well as gathering data from other sources such as life and death problems or joseki dictionaries. The complexity of this task and the amount of code required could easily match that of the code in org.moyoman.module itself. As with the development of the Go playing code, the quality of the testing effort will be dependent on having enough volunteers to perform this task.