16.3
Pruning Methods
Pruning algorithms are surveyed in chapter 13. The following paragraphs outline a few main
points. Because the target function is unknown, it is difficult to predict ahead
of time what size network will learn the data without overtraining. Not knowing
the optimum network configuration, one can train many networks and choose the
smallest or least complex one that learns the data. Although simple, this
approach can be inefficient if many networks must be trained before an
acceptable one is found. Even if the optimum size is known, the smallest
networks just complex enough to fit the data may (depending on the learning
algorithm) be sensitive to initial conditions and learning parameters. It may be
hard to tell if the network is too small to learn the data, if it is simply
learning very slowly, or if it is stuck in a local minima due to an unfortunate
set of initial conditions or parameters. Thus, even if one finds a small network
that will reliably learn the data, there might be a still smaller network that
would work but is very difficult to train.
The pruning approach is to train a network that is somewhat larger
than necessary and then remove unnecessary elements. The large initial size
allows the network to learn reasonably quickly with less sensitivity to initial
conditions and local minima while the reduced complexity of the trimmed system
favors improved generalization. In several studies, e.g., [345], [344], pruning techniques produced solutions for small
networks that generalized well and were not reliably obtainable by training the
reduced network with random weights.
Although pruning techniques provide a means to simplify a
network, they must be guided by other criteria to decide how simple the network
should be. That is, there is still a need for external information and
theoretical criteria to decide when to stop
pruning.