Overview
The following sections summarize some techniques for
initializing weights in sigmoidal networks. The basic motivation is to speed up
learning by choosing better initial solutions. A survey and empirical comparison
of a number of techniques is given by Thimm and Fiesler [368].
There are two clusters of methods. One consists of methods
for choosing parameters controlling the distribution of random initial weights.
The motivation here is to avoid sigmoid saturation problems that cause slow
training. Most of these methods do not use domain-specific information. The
other cluster consists of techniques for initializing the system from an
approximate solution found by another modeling system; common choices include
rule-based systems, decision trees, or nearest-neighbor classifiers. The
motivation here is to reduce training times and probability of convergence to
poor minima by starting the system near a good solution. An advantage of these
methods, besides faster training, is that they provide ways to embed
domain-dependent information in a network.