Overview
Noise is usually considered undesirable—something to be eliminated if possible, but many studies
(e.g. [299], [118], [310], [387], [345], [246], [287], [339], [267]) have noted that adding small amounts of noise to input
patterns during training often results in better generalization and fault
tolerance.
A short explanation for these results is that the noise blurs the
data. When random noise is added every time a pattern is presented, the network
never sees exactly that same input twice, even when the same training pattern is
selected, so it cannot simply "memorize" the training data. Averaging over the
noise effectively smooths the target function and prevents the network from
overfitting a limited set of training data. This turns out to be helpful for
generalization because many of the functions that interest people tend to be
smooth.
The following sections examine these ideas in more detail.
The term jitter is used to refer to noise intentionally added to the inputs in
contrast to undesired, uncontrolled noise from other sources.