4.1
Representational Capability
The representational capability of a network can be defined
as the range of mappings it can implement when the weights are varied. A
particular mapping is within the representational capability of a net if there
is a set of weights for which the net performs the mapping. Theories about
representational capability answer questions like: Given a set of inputs and
associated target outputs, is there a set of weights that allow the network to
perform the desired mapping? Is the network capable of generating the desired
outputs from the given inputs? In some cases, an exact solution may be required
while in others, an approximation may be allowed. Results say whether a
particular problem might be solvable by a particular network without necessarily
producing the weights that generate the solution. If the answer is negative, we
know the network cannot solve the problem and this saves us the expense of a
futile search. If the answer is positive, we know a solution exists but these
results don't guarantee that it will be easy to find or that a chosen
optimization method will be able to find it. Results of this kind provide broad
guidelines in selecting an architecture for a problem. One well-known
representational result, for example, is that single-layer networks are capable
of representing only linearly separable functions.
Two Hidden Layers Are Sufficient When designing a layered
network, an obvious first question is how many layers to use. Lippmann [247] shows that two hidden layers are sufficient to create
classification regions of any desired shape. In figure
4.2, linear threshold units in the first hidden layer divide the input space
into half-spaces with hyperplanes, units in the second hidden layer AND (form
intersections of) these half-spaces to produce convex regions, and the output
units OR (form unions of) the convex regions into arbitrary, possibly
unconnected, shapes. Given a sufficient number of units, a network can be formed
that divides the input space in any way desired, producing 0 when the input is
in one region and 1 when it is in the other. The boundaries are piecewise
linear, but any smooth boundary can be approximated with enough units.
To approximate continuous functions, one can add and subtract
(rather than logically OR) convex regions with appropriate weighting factors so
two hidden layers are also sufficient to approximate any desired bounded
continuous function [234].
One-hidden-layer Nets Can Represent Nonconvex, Disjoint
Regions Units in the second hidden layer of figure
4.2 respond to convex regions in the input space. Because these would be the
outputs in a single-hidden-layer network, it is sometimes mistakenly assumed
that single-hidden-layer networks can recognize only convex decision
regions.
Counter-examples are shown in figures
4.3 through 4.5
Figures
4.3and 4.4
illustrate nonconvex regions recognized by three-layer nets. (Figure
4.4 is derived from Weiland and Leighton [407].) Figure
4.5 shows examples of disjoint (unconnected) regions recognizable by
three-layer networks. Other examples can be found in [180], [255], [249].