C.1 Jitter:
Small-Perturbation Approximation
For small noise amplitudes, the network output y(x + n) can be
approximated by
(C.1) |
 |
where H is the Hessian matrix with elements
hij = ∂2y/(∂xi∂xj). Assuming
an even noise distribution so that <nk> = 0 for k odd, one can write
where m4 is the
fourth moment <n4>. Dropping
all terms higher than second order in σ gives
(C.2) |
 |
and when H is assumed to be zero, this reduces
to (17.15). The Laplacian term, Tr(H) = ▽2y, omitted in
(17.15), can be described as an approximate measure of the difference between
the average surrounding values and the precise value of the field at a point [100]. The third term in (C.2) is the first order
regularization term in (17.15).
Training with nonjittered data simply minimizes the error at
the training points and puts no constraints on the function at other points. In
contrast, training with jitter minimizes the error while also forcing the
approximating function to have small derivatives and a local average that
approaches the target in the vicinity of each training point.