9.2
Vogl's Method (Bold Driver)
Vogl et al. [380] describe an adaptive learning rate method where the
global learning rate η(t) at time t is updated according to
(9.1) |
 |
where φ > 1 and β < 1 are constants. Suggested values are φ
= 1.05 and β = 0.7. The name "bold driver" comes from Battiti [27]; there the value β = 0.5 is suggested based on the idea
that an increase in E indicates a minimum has
been overstepped and, on average it is reasonable to guess it is halfway between
the current and previous weights.
In addition to decreasing the learning rate when the error
increases significantly, the previous weight change is also retracted and the
momentum parameter is reset α = 0 for the next step. The justification for
clearing α is that α > 0 makes the current weight change similar to previous
weight changes and the increase in the error indicates the need for a change in
direction. Thus α is restored to its normal value after a successful step is
taken.
In [380], learning speed increased by a factor of about 2.5 and
30 on two test problems. A similar method without momentum was unfavorably
compared to conjugate gradient training on parity problems of various sizes in
[27]. There it appears to give results similar to normal
back-propagation with an optimally tuned fixed learning rate but without the
need to search for the optimal learning rate.
The method was empirically compared to a number of other
methods on a single test problem by Alpsan et al. [9]. In one case, learning was stopped as soon as all patterns
were correctly classified (all outputs on all patterns correct within a
tolerance 0.1 of the target values). With high momentum, it had about the same
speed as optimally tuned back-propagation, but generalization was not as good.
Generalization was better without momentum, but then learning was much slower
than regular back-propagation. In a second case where convergence criteria
required the outputs to essentially match the target values, the method
converged whereas plain back-propagation did not, but it was not among the
fastest methods. In an earlier test by the same authors, it was said to be
somewhat unstable and no easier to tune than plain back-propagation.