Overview
The standard multilayer perceptron (MLP) is a cascade of
single-layer perceptrons (figure 4.1). There is a layer of input nodes, a layer of
output nodes, and one or more intermediate layers. The interior layers are
sometimes called "hidden layers" because they are not directly observable from
the system inputs and outputs. Each node has a response f(wTx) where x is the vector of output activations
from the preceding layer, w is a vector of weights, and f is a bounded nondecreasing nonlinear function
such as the sigmoid. Normally, one of the weights acts as a bias by virtue of
connection to a constant input. Nodes in each layer are fully connected to nodes
in the preceding and following layers. There are no connections between units in
the same layer, connections from one layer back to a previous layer, or
"shortcut" connections that skip over intermediate layers. Although
back-propagation can be applied to more general networks, this is the most
commonly used structure.
The following sections summarize some properties and limitations
that result from this structure, independent of methods used to set the
weights.
How to Count Layers? A minor digression: there is some
disagreement about how to count layers in a network. Some say a network with one
hidden layer is a three-layer network because there are three layers of nodes:
the inputs, the hidden units, and the outputs. Others say this is a two-layer
network because there are only two layers of active nodes, the hidden units and
outputs. Inputs are excluded because they do no computation. We tend to follow
this convention and say that an L-layer
network has L active layers; that is, L — 1 hidden layers and an
output layer. Conveniently, this is also the number of weight layers. Not
everyone uses the same convention, however, so it is often simplest to
explicitly specify the number of hidden layers. The network in figure 4.1, for example, would be called a two-hidden-layer
network. In spite of the convention, it is natural to refer to the input layer
at times; we did so in the first paragraph of this chapter.
The notationN1/
N2/…/NL is sometimes used to describe the
structure of a layered network. This is simply a list of the number of nodes in
each layer. A 10/3/2 network, for example, has 10 inputs, 3 nodes in a hidden
layer, and 2 outputs. A 16/10/5/1 network would have 16 inputs, 10 nodes in the
first hidden layer, 5 nodes in the second hidden layer, and 1 output. Unless
otherwise specified, each layer is presumed to be fully connected to the
preceding and following layers with no short-cut or feedback connections. Figure 4.1 illustrates a 5/5/3/4 structure.