11.4
Remarks
The genetic algorithm is a general stochastic search method
that has been used successfully in a number of ways for neural network design
[3], [4], [50], [52], [66], [68], [108], [109], [155], [156], [304], [319]. Its main advantages are that it requires very little
problem-specific information and is relatively insensitive to local maxima
(minima). The algorithm itself is relatively easy to implement. Unlike some
other search techniques, it does not require detailed problem-specific knowledge
in order to generate new search candidates. Gradients are not required so the
algorithm can be applied to discontinuous functions or functions that are
defined empirically rather than analytically. It is applicable to mixed problems
containing both continuous and discrete variables.
Its main disadvantage is the amount of processing required to
evaluate and store a large number of different network configurations. Although
the actual bit manipulations take very little time, the user-supplied objective
function must be evaluated many times and this can be very slow with large
networks and large training sets. It is worth noting, however, that the
candidate solutions can be evaluated independently so N parallel processors should give close to a factor
of N reduction in computation time.
Another caveat is that although the algorithm itself needs little
problem-specific information, the efficiency of the search and the quality of
the results depend heavily on how parameters are represented in the bit string.
This is quite problem dependent and use of the algorithm may involve
experimenting with several different representations to find one that works
well.
Although there are claims of convergence to global maxima of the
fitness function, convergence to local maxima is possible with small populations
and aggressive culling of less successful solutions. Also, there are many
parameters to be selected (population size, mutation rate, crossover method,
fitness scaling, etc.) and it is not obvious how these affect the convergence
properties.
Because the main theoretical advantage of the algorithm is
its global optimization property and its main disadvantage is its inefficiency,
it may be useful to use the algorithm in conjunction with more efficient local
search methods. For example, the genetic algorithm might be used to do a
coarsely quantized search to find the region containing the global minimum and
then more efficient methods used to fine tune the solution in this
region.