Overview
One of the basic tasks in network design is to choose an
architecture and weights appropriate for the given problem. The genetic
algorithm (GA) [173], [141] is a general optimization method that has been applied
to many problems including neural network training. It is appropriate for neural
network training. It is appropriate for neural because it scales well to large
nonlinear problems with multiple local minima.
As the name implies, the genetic algorithm is based on an analogy
to natural evolutionary mechanisms. Many variations have been investigated, but
the basic idea is competition between alternative solutions and "survival of the
fittest." In this case, fitness is measured by a predefined objective function.
Individuals in a large population have varying traits that affect their
reproductive success in the given environment. Successful individuals live long
enough to mate and pass their traits to their offspring. Offspring inherit
traits from successful parents so they also have a good chance of being
successful. Over many generations, the population adapts to its environment;
disadvantageous traits become rare and the average fitness tends to increase
over time.
One of the main advantages of the algorithm is that it requires
very little problem-specific information. To apply the method to a specific
problem, all that is needed is a fitness function that evaluates individual
solutions and returns a rating of their quality, or "fitness." The algorithm
itself operates on bit strings containing the "genetic code," that is, the
parameters specifying a particular solution. Aside from the problem-specific
evaluation function, all problems look the same to the algorithm, differing only
in the length of the bit string and the number of units.
Because the algorithm needs so little problem-specific
information, it is useful for complex problems that are difficult to analyze
correctly. In particular, it does not need gradient information and so can be
used on discontinuous functions and functions that are described empirically
rather than analytically. It can also be used for temporal learning problems in
which evaluation comes at the end of a long sequence of actions with no
intermediate target values. It is not a simple hill-climbing method so it is not
particularly bothered by local maxima. It will also tolerate a certain amount of
noise in the evaluation function.
The algorithm has some of the flavor of simulated annealing in
that many alternative solutions are examined and the search contains an element
of randomness that helps prevent convergence to local maxima. It differs in that
many candidate solutions are maintained rather than just one and elements of the
better solutions are combined to generate new candidates. Like simulated
annealing, it is a general optimization method that has applications beyond
neural networks.
The main disadvantage of the method is the amount of
computation needed to evaluate and store a large population of candidate
solutions and converge to an optimum. Other techniques often converge faster
when they can be used.