The software described in this document is furnished under a license agreement. The software may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by,
for, or through the federal government of the United States. By accepting delivery of the Program or
Documentation, the government hereby agrees that this software or documentation qualifies as commercial
computer software or commercial computer software documentation as such terms are used or defined in
FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this
Agreement and only those rights specified in this Agreement, shall pertain to and govern the use,
modification, reproduction, release, performance, display, and disclosure of the Program and Documentation
by the federal government (or other entity acquiring for or through the federal government) and shall
supersede any conflicting contractual terms or conditions. If this License fails to meet the government's
needs or is inconsistent in any respect with federal procurement law, the government agrees to return the
Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
The MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Revision History
June 1992First printing
April 1993Second printing
January 1997Third printing
July 1997Fourth printing
January 1998Fifth printingRevised for Version 3 (Release 11)
September 2000Sixth printingRevised for Version 4 (Release 12)
June 2001Seventh printingMinor revisions (Release 12.1)
July 2002Online onlyMinor revisions (Release 13)
January 2003Online onlyMinor revisions (Release 13SP1)
June 2004Online onlyRevised for Version 4.0.3 (Release 14)
October 2004Online onlyRevised for Version 4.0.4 (Release 14SP1)
October 2004Eighth printingRevised for Version 4.0.4
March 2005Online onlyRevised for Version 4.0.5 (Release 14SP2)
March 2006Online onlyRevised for Version 5.0 (Release 2006a)
September 2006Ninth printingMinor revisions (Release 2006b)
March 2007Online onlyMinor revisions (Release 2007a)
September 2007Online onlyRevised for Version 5.1 (Release 2007b)
March 2008Online onlyRevised for Version 6.0 (Release 2008a)
October 2008Online onlyRevised for Version 6.0.1 (Release 2008b)
March 2009Online onlyRevised for Version 6.0.2 (Release 2009a)
September 2009Online onlyRevised for Version 6.0.3 (Release 2009b)
March 2010Online onlyRevised for Version 6.0.4 (Release 2010a)
Acknowledgments
The authors would like to thank the following people:
Joe Hicklin of The MathWorks™ for getting Howard into neural network
research years ago at the University of Idaho, for encouraging Howard and
Mark to write the toolbox, for providing crucial help in getting the first toolbox
Version 1.0 out the door, for continuing to help with the toolbox in many ways,
and for being such a good friend.
Roy Lurie of The MathWorksfor his continued enthusiasm for the possibilities
for Neural Network Toolbox™ software.
Mary Ann Freeman for general support and for her leadership of a great team of
people we enjoy working with.
Rakesh Kumar for cheerfully providing technical and practical help,
encouragement, ideas and always going the extra mile for us.
Alan LaFleur for facilitating our documentation work.
Tara Scott and Stephen Vanreusel for help with testing.
Orlando De Jesús of Oklahoma State University for his excellent work in
developing and programming the dynamic training algorithms described in
Chapter 6, “Dynamic Networks,” and in programming the neural network
controllers described in Chapter 7, “Control Systems.”
Martin Hagan, Howard Demuth, and Mark Beale for permission to include
various problems, demonstrations, and other material from Neural Network Design, January, 1996.
Neural Network Toolbox™ Design Book
The developers of the Neural Network Toolbox™ software have written a
textbook, Neural Network Design (Hagan, Demuth, and Beale, ISBN
0-9717321-0-8). The book presents the theory of neural networks, discusses
their design and application, and makes considerable use of the MATLAB
environment and Neural Network Toolbox software. Demonstration programs
from the book are used in various chapters of this user’s guide. (You can find
all the book demonstration programs in the Neural Network Toolbox software
by typing
This book can be obtained from John Stovall at (303) 492-3648, or by e-mail at
John.Stovall@colorado.edu.
The Neural Network Design textbook includes:
• An Instructor’s Manual for those who adopt the book for a class
• Transparency Masters for class use
If you are teaching a class and want an Instructor’s Manual (with solutions to
the book exercises), contact John Stovall at (303) 492-3648, or by e-mail at
John.Stovall@colorado.edu.
To look at sample chapters of the book and to obtain Transparency Masters, go
directly to the Neural Network Design page at
http://hagan.okstate.edu/nnd.html
From this link, you can obtain sample book chapters in PDF format and you
can download the Transparency Masters by clicking Transparency Masters (3.6MB).
nnd.)
®
You can get the Transparency Masters in PowerPoint or PDF format.
Applications for Neural Network Toolbox™ Software (p. 1-4)
Fitting a Function (p. 1-7)
Recognizing Patterns (p. 1-24)
Clustering Data (p. 1-42)
1
1 Getting Started
Neural Network
including connections
(called weights)
between neurons
InputOutput
Target
Adjust
weights
Compare
Product Overview
Neural networks are composed of simple elements operating in parallel. These
elements are inspired by biological nervous systems. As in nature, the
connections between elements largely determine the network function. You
can train a neural network to perform a particular function by adjusting the
values of the connections (weights) between elements.
Typically, neural networks are adjusted, or trained, so that a particular input
leads to a specific target output. The next figure illustrates such a situation.
There, the network is adjusted, based on a comparison of the output and the
target, until the network output matches the target. Typically, many such
input/target pairs are needed to train a network.
Neural networks have been trained to perform complex functions in various
fields, including pattern recognition, identification, classification, speech,
vision, and control systems.
Neural networks can also be trained to solve problems that are difficult for
conventional computers or human beings. The toolbox emphasizes the use of
neural network paradigms that build up to—or are themselves used in—
engineering, financial, and other practical applications.
The next sections explain how to use three graphical tools for training neural
networks to solve problems in function fitting, pattern recognition, and
clustering.
1-2
Using the Documentation
The neuron model and the architecture of a neural network describe how a
network transforms its input into an output. This transformation can be
viewed as a computation.
This first chapter gives you an overview of the Neural Network Toolbox™
product and introduces you to the following tasks:
• Training a neural network to fit a function
• Training a neural network to recognize patterns
• Training a neural network to cluster data
These next two chapters explain the computations that are done and pave the
way for an understanding of training methods for the networks. You should
read them before advancing to later topics:
• Chapter 2, “Neuron Model and Network Architectures,” presents the
fundamentals of the neuron model, the architectures of neural networks. It
also discusses the notation used in this toolbox.
• Chapter 3, “Perceptrons,” explains how to create and train simple networks.
It also introduces a graphical user interface (GUI) that you can use to solve
problems without a lot of coding.
Using the Documentation
1-3
1 Getting Started
Applications for Neural Network Toolbox™ Software
Applications in This Toolbox
Chapter 7, “Control Systems” describes three practical neural network control
system applications, including neural network model predictive control, model
reference adaptive control, and a feedback linearization controller.
Chapter 11, “Applications” describes other neural network applications.
Business Applications
The 1988 DARPA Neural Network Study [DARP88] lists various neural
network applications, beginning in about 1984 with the adaptive channel
equalizer. This device, which is an outstanding commercial success, is a singleneuron network used in long-distance telephone systems to stabilize voice
signals. The DARPA report goes on to list other commercial applications,
including a small word recognizer, a process monitor, a sonar classifier, and a
risk analysis system.
Neural networks have been applied in many other fields since the DARPA
report was written, as described in the next table.
simulation, aircraft control systems, autopilot
enhancements, aircraft component simulation,
and aircraft component fault detection
AutomotiveAutomobile automatic guidance system, and
warranty activity analysis
BankingCheck and other document reading and credit
application evaluation
Applications for Neural Network Toolbox™ Software
IndustryBusiness Applications
DefenseWeapon steering, target tracking, object
discrimination, facial recognition, new kinds of
sensors, sonar, radar and image signal processing
including data compression, feature extraction
and noise suppression, and signal/image
identification
screening, corporate bond rating, credit-line use
analysis, credit card activity tracking, portfolio
trading program, corporate financial analysis,
and currency price prediction
IndustrialPrediction of industrial processes, such as the
output gases of furnaces, replacing complex and
costly equipment used for this purpose in the past
InsurancePolicy application evaluation and product
optimization
ManufacturingManufacturing process control, product design
and analysis, process and machine diagnosis,
real-time particle identification, visual quality
inspection systems, beer testing, welding quality
analysis, paper quality prediction, computer-chip
quality analysis, analysis of grinding operations,
chemical product design analysis, machine
maintenance analysis, project bidding, planning
and management, and dynamic modeling of
chemical process system
1-5
1 Getting Started
IndustryBusiness Applications
MedicalBreast cancer cell analysis, EEG and ECG
analysis, prosthesis design, optimization of
transplant times, hospital expense reduction,
hospital quality improvement, and
emergency-room test advisement
Neural networks are good at fitting functions and recognizing patterns. In fact,
there is proof that a fairly simple neural network can fit any practical function.
Suppose, for instance, that you have data from a housing application
[HaRu78]. You want to design a network that can predict the value of a house
(in $1000s), given 13 pieces of geographical and real estate information. You
have a total of 506 example homes for which you have those 13 items of data
and their associated market values.
You can solve this problem in three ways:
• Use a command-line function, as described in “Using Command-Line
Functions” on page 1-7.
• Use a graphical user interface,
Network Fitting Tool GUI” on page 1-13.
• Use
Defining a Problem
To define a fitting problem for the toolbox, arrange a set of Q input vectors as
columns in a matrix. Then, arrange another set of Q target vectors (the correct
output vectors for each of the input vectors) into a second matrix. For example,
you can define the fitting problem for a Boolean AND gate with four sets of
two-element input vectors and one-element targets as follows:
Fitting a Function
nftool, as described in “Using the Neural
nntool, as described in “Graphical User Interface” on page 3-23.
inputs = [0 1 0 1; 0 0 1 1];
targets = [0 0 0 1];
The next section demonstrates how to train a network from the command line,
after you have defined the problem. This example uses the housing data set
provided with the toolbox.
Using Command-Line Functions
1 Load the data, consisting of input vectors and target vectors, as follows:
load house_dataset
2 Create a network. For this example, you use a feed-forward network with
the default tan-sigmoid transfer function in the hidden layer and linear
1-7
1 Getting Started
transfer function in the output layer. This structure is useful for function
approximation (or regression) problems. Use 20 neurons (somewhat
arbitrary) in one hidden layer. The network has one output neuron, because
there is only one target value associated with each input vector.
net = newfit(houseInputs,houseTargets,20);
Note More neurons require more computation, but they allow the network to
solve more complicated problems. More layers require more computation, but
their use might result in the network solving complex problems more
efficiently.
3 Train the network. The network uses the default Levenberg-Marquardt
algorithm for training. The application randomly divides input vectors and
target vectors into three sets as follows:
- 60% are used for training.
- 20% are used to validate that the network is generalizing and to stop
training before overfitting.
- The last 20% are used as a completely independent test of network
generalization.
1-8
To train the network, enter:
net=train(net,houseInputs,houseTargets);
During training, the following training window opens. This window displays
training progress and allows you to interrupt training at any point by
clicking
Stop Training.
Fitting a Function
This example used the train function. All the input vectors to the network
appear at once in a batch. Alternatively, you can present the input vectors
one at a time using the
adapt function. “Training Styles” on page 2-20
describes the two training approaches.
This training stopped when the validation error increased for six iterations,
which occurred at iteration 23. If you click
Performance in the training
window, a plot of the training errors, validation errors, and test errors
appears, as shown in the following figure. In this example, the result is
reasonable because of the following considerations:
1-9
1 Getting Started
- The final mean-square error is small.
- The test set error and the validation set error have similar characteristics.
- No significant overfitting has occurred by iteration 17 (where the best
validation performance occurs).
1-10
4 Perform some analysis of the network response. If you click Regression in
the training window, you can perform a linear regression between the
network outputs and the corresponding targets.
The following figure shows the results.
Fitting a Function
The output tracks the targets very well for training, testing, and validation,
and the R-value is over 0.95 for the total response. If even more accurate
results were required, you could try any of these approaches:
• Reset the initial network weights and biases to new values with
init and
train again.
• Increase the number of hidden neurons.
• Increase the number of training vectors.
1-11
1 Getting Started
• Increase the number of input values, if more relevant information is
available.
• Try a different training algorithm (see “Speed and Memory Comparison” on
page 5-34).
In this case, the network response is satisfactory, and you can now use
put the network to use on new inputs.
To get more experience in command-line operations, try some of these tasks:
• During training, open a plot window (such as the regression plot), and watch
it animate.
• Plot from the command line with functions such as
plotregression, plottrainstate and plotperform. (For more information
on using these functions, see their reference pages.)
plotfit,
sim to
1-12
Using the Neural Network Fitting Tool GUI
1 Open the Neural Network Fitting Tool with this command:
nftool
Fitting a Function
1-13
1 Getting Started
2 Click Next to proceed.
1-14
3 Click Load Example Data Set in the Select Data window. The Fitting Data
Set Chooser window opens.
Note You use the Inputs and Targets options in the Select Data window
when you need to load data from the MATLAB® workspace.
Fitting a Function
4 Select Simple Fitting Problem, and click Import. This brings you back to
the Select Data window.
1-15
1 Getting Started
5 Click Next to display the Validate and Test Data window, shown in the
following figure.
The validation and test data sets are each set to 15% of the original data.
1-16
6 Click Next.
Fitting a Function
The number of hidden neurons is set to
20. You can change this value in
another run if you want. You might want to change this number if the
network does not perform as well as you expect.
1-17
1 Getting Started
7 Click Next.
1-18
8 Click Train.
This time the training continued for the maximum of 1000 iterations.
Fitting a Function
9 Under Plots, click Regression.
For this simple fitting problem, the fit is almost perfect for training, testing,
and validation data.
1-19
1 Getting Started
1-20
These plots are the regression plots for the output with respect to training,
validation, and test data.
10 View the network response. For single-input/single-output problems, like
this simple fitting problem, under the
Plots pane, click Fit.
Fitting a Function
The blue symbols represent training data, the green symbols represent
validation data, and the red symbols represent testing data. For this
problem and this network, the network outputs match the targets for all
three data sets.
11 Click Next in the Neural Network Fitting Tool to evaluate the network.
1-21
1 Getting Started
1-22
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new
data, you can take any of the following steps:
- Train it again.
- Increase the number of neurons.
- Get a larger training data set.
12 If you are satisfied with the network performance, click Next.
13 Use the buttons on this screen to save your results.
Fitting a Function
- You have the network saved as
net1 in the workspace. You can perform
additional tests on it or put it to work on new inputs, using the
function.
- You can also click
Generate M-File to create an M-file that can be used to
reproduce all of the previous steps from the command line. Creating an
M-file can be helpful if you want to learn how to use the command-line
functionality of the toolbox to customize the training process.
14 When you have saved your results, click Finish.
sim
1-23
1 Getting Started
Recognizing Patterns
In addition to function fitting, neural networks are also good at recognizing
patterns.
For example, suppose you want to classify a tumor as benign or malignant,
based on uniformity of cell size, clump thickness, mitosis, etc. [MuAh94]. You
have 699 example cases for which you have 9 items of data and the correct
classification as benign or malignant.
As with function fitting, there are three ways to solve this problem:
• Use a command-line solution, as described in “Using Command-Line
• Use the
• Use
Defining a Problem
To define a pattern recognition problem, arrange a set of Q input vectors as
columns in a matrix. Then arrange another set of Q target vectors so that they
indicate the classes to which the input vectors are assigned. There are two
approaches to creating the target vectors.
Functions” on page 1-43.
nprtool GUI, as described in “Using the Neural Network Clustering
Tool GUI” on page 1-47.
nntool, as described in “Graphical User Interface” on page 3-23.
1-24
One approach can be used when there are only two classes; you set each scalar
target value to either 1 or 0, indicating which class the corresponding input
belongs to. For instance, you can define the exclusive-or classification problem
as follows:
inputs = [0 1 0 1; 0 0 1 1];
targets = [0 1 0 1];
Alternately, target vectors can have N elements, where for each target vector,
one element is 1 and the others are 0. This defines a problem where inputs are
to be classified into N different classes. For example, the following lines show
how to define a classification problem that divides the corners of a 5-by-5-by-5
cube into three classes:
• The origin (the first input vector) in one class
• The corner farthest from the origin (the last input vector) in a second class
Classification problems involving only two classes can be represented using
either format. The targets can consist of either scalar 1/0 elements or
two-element vectors, with one element being 1 and the other element being 0.
The next section demonstrates how to train a network from the command line,
after you have defined the problem.
Using Command-Line Functions
1 Use the cancer data set as an example. This data set consists of 699
nine-element input vectors and two-element target vectors.
Load the tumor classification data as follows:
load cancer_dataset
2 Create a network. For this example, you use a pattern recognition network,
which is a feed-forward network with tan-sigmoid transfer functions in both
the hidden layer and the output layer. As in the function-fitting example,
use 20 neurons in one hidden layer:
- The network has two output neurons, because there are two categories
associated with each input vector.
- Each output neuron represents a category.
- When an input vector of the appropriate category is applied to the
network, the corresponding neuron should produce a 1, and the other
neurons should output a 0.
To create a network, enter this command:
net = newpr(cancerInputs,cancerTargets,20);
3 Train the network. The pattern recognition network uses the default Scaled
Conjugate Gradient algorithm for training. The application randomly
divides the input vectors and target vectors into three sets:
- 60% are used for training.
- 20% are used to validate that the network is generalizing and to stop
training before overfitting.
1-25
1 Getting Started
- The last 20% are used as a completely independent test of network
generalization.
To train the network, enter this command:
net=train(net,cancerInputs,cancerTargets);
During training, as in function fitting, the training window opens. This
window displays training progress. To interrupt training at any point, click
Stop Training.
1-26
Recognizing Patterns
This example uses the train function. It presents all the input vectors to the
network at once in a batch. Alternatively, you can present the input vectors
one at a time using the
adapt function. “Training Styles” on page 2-20
describes the two training approaches.
This training stopped when the validation error increased for six iterations,
which occurred at iteration 15.
4 To find the validation error, click Performance in the training window. A
plot of the training errors, validation errors, and test errors appears, as
1-27
1 Getting Started
shown in the following figure. The best validation performance occurred at
iteration 9, and the network at this iteration is returned.
1-28
5 To analyze the network response, click Confusion in the training window.
A display of the confusion matrix appears that shows various types of errors
that occurred for the final trained network.
The next figure shows the results.
Recognizing Patterns
The diagonal cells in each table show the number of cases that were correctly
classified, and the off-diagonal cells show the misclassified cases. The blue cell
in the bottom right shows the total percent of correctly classified cases (in
green) and the total percent of misclassified cases (in red). The results for all
three data sets (training, validation, and testing) show very good recognition.
If you needed even more accurate results, you could try any of the following
approaches:
1-29
1 Getting Started
• Reset the initial network weights and biases to new values with init and
train again.
• Increase the number of hidden neurons.
• Increase the number of training vectors.
• Increase the number of input values, if more relevant information is
available.
• Try a different training algorithm (see “Speed and Memory Comparison” on
page 5-34).
In this case, the network response is satisfactory, and you can now use
put the network to use on new inputs.
To get more experience in command-line operations, here are some tasks you
can try:
• During training, open a plot window (such as the confusion plot), and watch
it animate.
• Plot from the command line with functions such as
plottrainstate, and plotperform. (For more information on using these
functions, see their reference pages.)
plotconfusion, plotroc,
sim to
1-30
Recognizing Patterns
Using the Neural Network Pattern
Recognition Tool GUI
1 Open the Neural Network Pattern Recognition Tool window with this
command:
nprtool
1-31
1 Getting Started
2 Click Next to proceed. The Select Data window opens.
1-32
3 Click Load Example Data Set. The Pattern Recognition Data Set Chooser
window opens.
Recognizing Patterns
4 In this window, select Simple Classes, and click Import. You return to the
Select Data window.
1-33
1 Getting Started
5 Click Next to continue to the Validate and Test Data window, shown in the
following figure.
Validation and test data sets are each set to 15% of the original data.
1-34
6 Click Next.
Recognizing Patterns
The number of hidden neurons is set to
20. You can change this in another
run if you want. You might want to change this number if the network does
not perform as well as you expect.
1-35
1 Getting Started
7 Click Next.
1-36
8 Click Train.
Recognizing Patterns
The training continues for 55 iterations.
9 Under the Plots pane, click Confusion in the Neural Network Pattern
Recognition Tool.
The next figure shows the confusion matrices for training, testing, and
validation, and the three kinds of data combined. The network's outputs are
almost perfect, as you can see by the high numbers of correct responses in
1-37
1 Getting Started
the green squares and the low numbers of incorrect responses in the red
squares. The lower right blue squares illustrate the overall accuracies.
1-38
10 Plot the Receiver Operating Characteristic (ROC) curve. Under the Plots
pane, click Receiver Operating Characteristic in the Neural Network
Pattern Recognition Tool.
Recognizing Patterns
The colored lines in each axis represent the ROC curves for each of the four
categories of this simple test problem. The
ROC curve is a plot of the true
positive rate (sensitivity) versus the false positive rate (1 - specificity) as the
threshold is varied. A perfect test would show points in the upper-left corner,
with 100% sensitivity and 100% specificity. For this simple problem, the
network performs almost perfectly.
1-39
1 Getting Started
11 In the Neural Network Pattern Recognition Tool, click Next to evaluate the
network.
1-40
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new
data, you can train it again, increase the number of neurons, or perhaps get
a larger training data set.
Recognizing Patterns
12 When you are satisfied with the network performance, click Next.
13 Use the buttons on this screen to save your results.
- You now have the network saved as
net1 in the workspace. You can
perform additional tests on it or put it to work on new inputs using the
function.
- If you click
Generate M-File, the tool creates an M-file, with commands
that recreate the steps that you have just performed from the command
line. Generating an M-file is a good way to learn how to use the
command-line operations of the Neural Network Toolbox™ software.
14 When you have saved your results, click Finish.
sim
1-41
1 Getting Started
Clustering Data
Clustering data is another excellent application for neural networks. This
process involves grouping data by similarity. For example, you might perform:
• Market segmentation by grouping people according to their buying patterns
• Data mining by partitioning data into related subsets
• Bioinformatic analysis by grouping genes with related expression patterns
Suppose that you want to cluster flower types according to petal length, petal
width, sepal length, and sepal width [MuAh94]. You have 150 example cases
for which you have these four measurements.
As with function fitting and pattern recognition, there are three ways to solve
this problem:
• Use a command-line solution, as described in “Using Command-Line
Functions” on page 1-43.
• Use the nctool GUI, as described in “Using the Neural Network Clustering
Tool GUI” on page 1-47.
• Use
nntool, as described in “Graphical User Interface” on page 3-23.
1-42
Defining a Problem
To define a clustering problem, simply arrange Q input vectors to be clustered
as columns in an input matrix. For instance, you might want to cluster this set
of 10 two-element vectors:
The next section demonstrates how to train a network from the command line,
after you have defined the problem.
Clustering Data
Using Command-Line Functions
1 Use the flower data set as an example. The iris data set consists of 150
four-element input vectors.
Load the data as follows:
load iris_dataset
This data set consists of input vectors and target vectors. However, you only
need the input vectors for clustering.
2 Create a network. For this example, you use a self-organizing map (SOM).
This network has one layer, with the neurons organized in a grid. (For more
information, see “Self-Organizing Feature Maps” on page 9-9.) When
creating the network, you specify the number of rows and columns in the
grid:
net = newsom(irisInputs,[6,6]);
3 Train the network. The SOM network uses the default batch SOM algorithm
for training.
net=train(net,irisInputs);
4 During training, the training window opens and displays the training
progress. To interrupt training at any point, click
Stop Training.
1-43
1 Getting Started
1-44
5 For SOM training, the weight vector associated with each neuron moves to
become the center of a cluster of input vectors. In addition, neurons that are
adjacent to each other in the topology should also move close to each other
in the input space. The default topology is hexagonal; to view it, click
Topology
from the network training window.
SOM
Clustering Data
In this figure, each of the hexagons represents a neuron. The grid is 6-by-6,
so there are a total of 36 neurons in this network. There are four elements
in each input vector, so the input space is four-dimensional. The weight
vectors (cluster centers) fall within this space.
Because this SOM has a two-dimensional topology, you can visualize in two
dimensions the relationships among the four-dimensional cluster centers.
One visualization tool for the SOM is the
the
U-matrix).
6 To view the U-matrix, click SOM Neighbor Distances in the training
weight distance matrix (also called
window.
1-45
1 Getting Started
1-46
In this figure, the blue hexagons represent the neurons. The red lines
connect neighboring neurons. The colors in the regions containing the red
lines indicate the distances between neurons. The darker colors represent
larger distances, and the lighter colors represent smaller distances.
A band of dark segments crosses from the lower-center region to the
upper-right region. The SOM network appears to have clustered the flowers
into two distinct groups.
To get more experience in command-line operations, try some of these tasks:
• During training, open a plot window (such as the SOM weight position plot)
and watch it animate
• Plot from the command line with functions such as
plotsomnd, plotsomplanes, plotsompos, and plotsomtop. (For more
information on using these functions, see their reference pages.)
plotsomhits, plotsomnc,
Clustering Data
Using the Neural Network Clustering Tool GUI
1 Open the Neural Network Clustering Tool window with this command:
nctool
1-47
1 Getting Started
2 Click Next. The Select Data window appears.
1-48
Clustering Data
3 Click Load Example Data Set. The Clustering Data Set Chooser window
appears.
4 In this window, select Simple Clusters, and click Import. You return to the
Select Data window.
1-49
1 Getting Started
5 Click Next to continue to the Network Size window, shown in the following
figure.
The size of the two-dimensional map is set to
side of a two-dimensional grid. The total number of neurons is 100. You can
change this number in another run if you want.
10. This map represents one
1-50
6 Click Next. The Train Network window appears.
Clustering Data
1-51
1 Getting Started
7 Click Train
1-52
The training runs for the maximum number of epochs, which is 200.
Clustering Data
8 Investigate some of the visualization tools for the SOM. Under the Plots
pane, click SOM Sample Hits.
This figure shows how many of the training data are associated with each of
the neurons (cluster centers). The topology is a 10-by-10 grid, so there are
100 neurons. The maximum number of hits associated with any neuron is
22. Thus, there are 22 input vectors in that cluster.
9 You can also visualize the SOM by displaying weight places (also referred to
as
component planes). Click SOM Weight Planes in the Neural Network
Clustering Tool.
1-53
1 Getting Started
This figure shows a weight plane for each element of the input vector (two,
in this case). They are visualizations of the weights that connect each input
to each of the neurons. (Darker colors represent larger weights.) If the
connection patterns of two inputs were very similar, you can assume that
the inputs are highly correlated. In this case, input 1 has connections that
are very different than those of input 2.
1-54
10 In the Neural Network Clustering Tool, click Next to evaluate the network.
Clustering Data
At this point you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new
data, you can increase the number of neurons, or perhaps get a larger
training data set.
11 When you are satisfied with the network performance, click Next.
1-55
1 Getting Started
12 Use the buttons on this screen to save your results.
1-56
• You now have the network saved as
additional tests on it, or put it to work on new inputs, using the function
• If you click
recreate the steps that you have just performed from the command line.
Generating an M-file is a good way to learn how to use the command-line
operations of the Neural Network Toolbox™ software.
13 When you have saved your results, click Finish.
Generate M-File, the tool creates an M-file, with commands that
net1 in the workspace. You can perform
sim.
Neuron Model and
Network Architectures
Neuron Model (p. 2-2)
Network Architectures (p. 2-8)
Data Structures (p. 2-14)
Training Styles (p. 2-20)
2
2 Neuron Model and Network Architectures
Input- Title -
- Exp -
an
p
w
f
Neuron without bias
a = f (wp )
Input- Title -
- Exp -
an
p
f
Neuron with bias
a = f (wp + b)
b
1
w
Neuron Model
Simple Neuron
A neuron with a single scalar input and no bias appears on the left below.
The scalar input p is transmitted through a connection that multiplies its
strength by the scalar weight w to form the product wp, again a scalar. Here
the weighted input wp is the only argument of the transfer function f, which
produces the scalar output
can view the bias as simply being added to the product wp as shown by the
summing junction or as shifting the function f to the left by an amount b. The
bias is much like a weight, except that it has a constant input of 1.
a. The neuron on the right has a scalar bias, b. You
2-2
The transfer function net input n, again a scalar, is the sum of the weighted
input wp and the bias b. This sum is the argument of the transfer function f.
(Chapter 8, “Radial Basis Networks,” discusses a different way to form the net
input n.) Here f is a transfer function, typically a step function or a sigmoid
function, that takes the argument n and produces the output a. Examples of
various transfer functions are in “Transfer Functions” on page 2-3. Note that
w and b are both adjustable scalar parameters of the neuron. The central idea
of neural n e t works is th a t such param e ters can b e adjuste d s o that the n etwork
exhibits some desired or interesting behavior. Thus, you can train the network
to do a particular job by adjusting the weight or bias parameters, or perhaps
the network itself will adjust these parameters to achieve some desired end.
All the neurons in the Neural Network Toolbox™ software have provision for
a bias, and a bias is used in many of the examples and is assumed in most of
this toolbox. However, you can omit a bias in a neuron if you want.
Neuron Model
a = hardlim(n)
Hard-Limit Transfer Function
-1
n
0
+1
a
As previously noted, the bias b is an adjustable (scalar) parameter of the
neuron. It is
not an input. However, the constant 1 that drives the bias is an
input and must be treated as such when you consider the linear dependence of
input vectors in Chapter 4, “Linear Filters.”
Transfer Functions
Many transfer functions are included the Neural Network Toolbox software.
Three of the most commonly used functions are shown below.
The hard-limit transfer function shown above limits the output of the neuron
to either 0, if the net input argument n is less than 0, or 1, if n is greater than
or equal to 0. This function is used in Chapter 3, “Perceptrons,” to create
neurons that make classification decisions.
The toolbox has a function,
hardlim, to realize the mathematical hard-limit
transfer function shown above. Try the following code:
n = -5:0.1:5;
plot(n,hardlim(n),'c+:');
It produces a plot of the function hardlim over the range -5 to +5.
All the mathematical transfer functions in the toolbox can be realized with a
function having the same name.
The following figure illustrates the linear transfer function.
2-3
2 Neuron Model and Network Architectures
n
0
-1
+1
a = purelin(n)
Linear Transfer Function
a
-1
n
0
+1
a
Log-Sigmoid Transfer Function
a = logsig(n)
Neurons of this type are used as linear approximators in Chapter 4, “Linear
Filters.”
The sigmoid transfer function shown below takes the input, which can have
any value between plus and minus infinity, and squashes the output into the
range 0 to 1.
2-4
This transfer function is commonly used in backpropagation networks, in part
because it is differentiable.
The symbol in the square to the right of each transfer function graph shown
above represents the associated transfer function. These icons replace the
general
f in the boxes of network diagrams to show the particular transfer
function being used.
For a complete listing of transfer functions and their icons, You can also specify
your own transfer functions.
You can experiment with a simple neuron and various transfer functions by
running the demonstration program
nnd2n1.
Neuron Model
p1, p2,... p
R
w
11,
, w
12,
, ... w
1 R,
Input
p
1
an
p
2
p
3
p
R
w
1, R
w
1,1
f
b
1
Where
R = number of
elements in
input vector
Neuron w Vector Input
a = f(Wp +b)
nw
11,p1w12,p2
...w
1 R,pR
b+++ +=
Neuron with Vector Input
A neuron with a single R-element input vector is shown below. Here the
individual element inputs
are multiplied by weights
and the weighted values are fed to the summing junction. Their sum is simply
Wp, the dot product of the (single row) matrix W and the vector p.
The neuron has a bias b, which is summed with the weighted inputs to form
the net input n. This sum, n, is the argument of the transfer function f.
This expression can, of course, be written in MATLAB
n = W*p + b
®
code as
However, you will seldom be writing code at this level, for such code is already
built into functions to define and simulate entire networks.
Abbreviated Notation
The figure of a single neuron shown above contains a lot of detail. When you
consider networks with many neurons, and perhaps layers of many neurons,
there is so much detail that the main thoughts tend to be lost. Thus, the
2-5
2 Neuron Model and Network Architectures
pa
1
n
W
b
R x 1
1
xR
1
x1
1
x1
1
x1
Input
R
1
f
Where...
R = number of
elements in
input vector
Neuron
a = f(Wp +b)
authors have devised an abbreviated notation for an individual neuron. This
notation, which is used later in circuits of multiple neurons, is shown.
Here the input vector
The dimensions of
p is represented by the solid dark vertical bar at the left.
p are shown below the symbol p in the figure as Rx1. (Note
that a capital letter, such as R in the previous sentence, is used when referring
to the size of a vector.) Thus,
postmultiply the single-row, R-column matrix
p is a vector of R input elements. These inputs
W. As before, a constant 1 enters
the neuron as an input and is multiplied by a scalar bias b. The net input to the
transfer function f is n, the sum of the bias b and the product
Wp. This sum is
passed to the transfer function f to get the neuron’s output a, which in this case
is a scalar. Note that if there were more than one neuron, the network output
would be a vector.
A layer of a network is defined in the previous figure. A layer includes the
combination of the weights, the multiplication and summing operation (here
realized as a vector product
array of inputs, vector
Wp), the bias b, and the transfer function f. The
p, is not included in or called a layer.
Each time this abbreviated network notation is used, the sizes of the matrices
are shown just below their matrix variable names. This notation will allow you
to understand the architectures and follow the matrix mathematics associated
with them.
As discussed in “Transfer Functions” on page 2-3, when a specific transfer
function is to be used in a figure, the symbol for that transfer function replaces
the f shown above. Here are some examples.
2-6
Neuron Model
purelinhardlimlogsig
You can experiment with a two-element neuron by running the demonstration
program
nnd2n2.
2-7
2 Neuron Model and Network Architectures
f
f
f
w
1,1
w
SR,
S
S
S
n
1
p
1
p
2
p
3
p
R
n
2
n
S
b
1
b
2
b
S
a
1
a
2
a
S
1
1
1
Inputs
Layer of Neurons
a=f(Wp+b)
Where
= number of
elements in
input vector
= number of
neurons in layer
R
S
Network Architectures
Two or more of the neurons shown earlier can be combined in a layer, and a
particular network could contain one or more such layers. First consider a
single layer of neurons.
A Layer of Neurons
A one-layer network with R input elements and S neurons follows.
2-8
In this network, each element of the input vector
input through the weight matrix
W. The ith neuron has a summer that gathers
p is connected to each neuron
its weighted inputs and bias to form its own scalar output n(i). The various n(i)
taken together form an S-element net input vector
the figure.
Note that it is common for the number of inputs to a layer to be different from
the number of neurons (i.e., R is not necessarily equal to S). A layer is not
constrained to have the number of its inputs equal to the number of its
neurons.
outputs form a column vector
a. The expression for a is shown at the bottom of
n. Finally, the neuron layer
Network Architectures
W
w
11,w12,
… w
1 R,
w
21,w22,
… w
2 R,
w
S 1,wS 2,
… w
SR,
=
a= f (Wp+b)
pa
1
n
W
b
R x 1
S
xR
S
x1
S
x1
Input
Layer of Neurons
RS
f
S x 1
R = number of
elements in
input vector
Where...
S = number of
neurons in layer 1
You can create a single (composite) layer of neurons having different transfer
functions simply by putting two of the networks shown earlier in parallel. Both
networks would have the same inputs, and each network would create some of
the outputs.
The input vector elements enter the network through the weight matrix
Note that the row indices on the elements of matrix
W indicate the destination
W.
neuron of the weight, and the column indices indicate which source is the input
for that weight. Thus, the indices in w
from the second input element
to the first (and only) neuron is w
say that the strength of the signal
1,2
1,2
.
The S neuron R input one-layer network also can be drawn in abbreviated
notation.
Here
p is an R length input vector, W is an SxR matrix, and a and b are S
length vectors. As defined previously, the neuron layer includes the weight
matrix, the multiplication operations, the bias vector
transfer function boxes.
Inputs and Layers
To describe networks having multiple layers, the notation must be extended.
Specifically, it needs to make a distinction between weight matrices that are
b, the summer, and the
2-9
2 Neuron Model and Network Architectures
pa
1
1
n
1
S
1
xR
S
1
x1
S
1
x1
S
1
x 1
Input
IW
1,1
b
1
Layer 1
S
1
f
1
R
a
1
= f1(IW
1,1
p +b1)
S
1
x1
R
x1
R = number of
elements in
input vector
S = number of
neurons in Layer 1
Where...
connected to inputs and weight matrices that are connected between layers. It
also needs to identify the source and destination for the weight matrices.
We will call weight matrices connected to inputs input weights; we will call
weight matrices coming from layer outputs layer weights. Further,
superscripts are used to identify the source (second index) and the destination
(first index) for the various weights and other elements of the network. To
illustrate, the one-layer multiple input network shown earlier is redrawn in
abbreviated form below.
As you can see, the weight matrix connected to the input vector
an input weight matrix (
destination 1 (first index). Elements of layer 1, such as its bias, net input, and
output have a superscript 1 to say that they are associated with the first layer.
“Multiple Layers of Neurons” uses layer weight (
weight (
IW) matrices.
Multiple Layers of Neurons
A network can have several layers. Each layer has a weight matrix W, a bias
vector
output vectors, etc., for each of these layers in the figures, the number of the
layer is appended as a superscript to the variable of interest. You can see the
use of this layer notation in the three-layer network shown below, and in the
equations at the bottom of the figure.
2-10
b, and an output vector a. To distinguish between the weight matrices,
1,1
IW
) having a source 1 (second index) and a
LW) matrices as well as input
p is labeled as
Network Architectures
iw
1,1
1,1
lw
2,1
1,1
lw
,
1,1
32
iw
1,1
1
SR,
lw
2,1
21
SS ,
lw
3,2
SS
32
,
SSS
SSS
SSS
n
1
1
n
1
2
n
1
3
p
1
p
2
p
3
p
R
n
2
1
n
2
2
n
2
3
n
1
S
1n
2
S
2n
3
S
3
b
1
1
b
1
2
b
1
3
b
2
1
b
2
2
b
2
3
b
1
S
1b
2
S
2b
3
S
3
a
1
1
a
1
2
a
1
3
a
2
1
a
2
2
a
2
3
a
1
S
1a
2
S
2a
3
S
3
111
111
111
InputsLayer 1Layer 2Layer 3
f
2
f
2
f
2
f
3
f
3
f
3
f
1
f
1
f
1
aIWpb
11,11
=(+)f
1
aLWab
22,1 12
=(+)f
2
aLWab
33,2 23
=(+)f
3
a LWLWIWpbbb
33,22,11,1123
= (((+)+)+)fff
321
The network shown above has R1 inputs, S1 neurons in the first layer, S
neurons in the second layer, etc. It is common for different layers to have
different numbers of neurons. A constant input 1 is fed to the bias for each
neuron.
Note that the outputs of each intermediate layer are the inputs to the following
layer. Thus layer 2 can be analyzed as a one-layer network with S
neurons, and an S
2
is
a
. Now that all the vectors and matrices of layer 2 have been identified, it
can be treated as a single-layer network on its own. This approach can be taken
with any layer of the network.
The layers of a multilayer network play different roles. A layer that produces
the network output is called an output layer. All other layers are called hidden layers. The three-layer network shown earlier has one output layer (layer 3)
and two hidden layers (layer 1 and layer 2). Some authors refer to the inputs
as a fourth layer. This toolbox does not use that designation.
2xS1
weight matrix W2. The input to layer 2 is a1; the output
2
1
inputs, S2
2-11
2 Neuron Model and Network Architectures
pa1a
2
11
n
1
n
2
a
3 =
y
n
3
1
S
2 x S
1
S
2 x
1
S
2 x
1
S 2 x
1
S
3
x S
2
S
3 x
1
S
3
x 1
S
3
x 1
R x
1
S
1 x R
S
1 x
1
S 1 x
1
S
1 x
1
Input
IW
1,1
b
1
b
2
b
3
LW
2,1
LW
3,2
RS
3
S
1
S
2
f
2
f
3
Layer 1Layer 2Layer 3
a
1
=
f
1
(IW
1,1
p +b1)
a
2
=
f
2
(LW
2,1 a1
+b2)a3 =
f
3
(LW
3,2a2
+b3)
a
3
=
f
3
(LW
3,2
f
2
(LW
2,1
f
1
(IW
1,1p +b1
)+ b2)
+
b
3 =
y
f
1
The same three-layer network can also be drawn using abbreviated notation.
Multiple-layer networks are quite powerful. For instance, a network of two
layers, where the first layer is sigmoid and the second layer is linear, can be
trained to approximate any function (with a finite number of discontinuities)
arbitrarily well. This kind of two-layer network is used extensively in Chapter
5, “Backpropagation.”
Here it is assumed that the output of the third layer,
of interest, and this output is labeled as
output of multilayer networks.
3
a
, is the network output
y. This notation is used to specify the
Input and Output Processing Functions
Network inputs might have associated processing functions. Processing
functions transform user input data to a form that is easier or more efficient for
a network.
For instance,
interval [-1, 1]. This can speed up learning for many networks.
removeconstantrows removes the values for input elements that always have
2-12
the same value because these input elements are not providing any useful
information to the network. The third common processing function is
fixunknowns, which recodes unknown data (represented in the user’s data
with
NaN values) into a numerical form for the network. fixunknowns preserves
information about which values are known and which are unknown.
mapminmax transforms input data so that all values fall into the
Network Architectures
Similarly, network outputs can also have associated processing functions.
Output processing functions are used to transform user-provided target vectors
for network use. Then, network outputs are reverse-processed using the same
functions to produce output data with the same characteristics as the original
user-provided targets.
Both
mapminmax and removeconstantrows are often associated with network
outputs. However,
(represented by
fixunknowns is not. Unknown values in targets
NaN values) do not need to altered for network use.
Processing functions are described in more detail in “Preprocessing and
Postprocessing” in Chapter 5.
2-13
2 Neuron Model and Network Architectures
p
1
an
Inputs
b
p
2
w
1,2
w
1,1
1
a = purelin(Wp+b)
Linear Neuron
p
1
1
2
=,
p
2
2
1
=,
p
3
2
3
=,
p
4
3
1
=
Data Structures
This section discusses how the format of input data structures affects the
simulation of networks. It starts with static networks, and then continues with
dynamic networks.
There are two basic types of input vectors: those that occur concurrently (at the
same time, or in no particular time sequence), and those that occur sequentially
in time. For concurrent vectors, the order is not important, and if there were a
number of networks running in parallel, you could present one input vector to
each of the networks. For sequential vectors, the order in which the vectors
appear is important.
Simulation with Concurrent Inputs in a Static
Network
The simplest situation for simulating a network occurs when the network to be
simulated is static (has no feedback or delays). In this case, you need not be
concerned about whether or not the input vectors occur in a particular time
sequence, so you can treat the inputs as concurrent. In addition, the problem is
made even simpler by assuming that the network has only one input vector.
Use the following network as an example.
2-14
Suppose that the network simulation data set consists of Q = 4 concurrent
vectors:
Data Structures
W
12
=
b
0
=
Concurrent vectors are presented to the network as a single matrix:
P = [1 2 2 3; 2 1 3 1];
Suppose that this network typically outputs -100, 50 and 100. This is arbitrary
for this example. If you were solving a real problem, you would have actual
values.
T = [-100 50 50 100];
To set up this feedforward network, use the following command:
net = newlin(P,T);
For simplicity assign the weight matrix and bias to be
and
The commands for these assignments are
net.IW{1,1} = [1 2];
net.b{1} = 0;
You can now simulate the network:
A = sim(net,P)
A =
5 4 8 5
A single matrix of concurrent vectors is presented to the network, and the
network produces a single matrix of concurrent vectors as output. The result
would be the same if there were four networks operating in parallel and each
network received one of the input vectors and produced one of the outputs. The
ordering of the input vectors is not important, because they do not interact with
each other.
Simulation with Sequential Inputs in a Dynamic
Network
When a network contains delays, the input to the network would normally be
a sequence of input vectors that occur in a certain time order. To illustrate this
case, the next figure shows a simple network that contains one delay.
2-15
2 Neuron Model and Network Architectures
a(t)n(t)
Inputs
w
1,1
D
w
1,2
Linear Neuron
p(t)
a(t) = w
p(t)+w
p(t-1)
p11=, p22=, p33=, p44=
W
12
=
Suppose that the input sequence is
Sequential inputs are presented to the network as elements of a cell array:
P = {1 2 3 4};
2-16
Suppose you know that the typical output values include 10, 3 and -7. These
values are arbitrary for this example; if you were solving a real problem, you
would have real output values.
T = {10, 3, 7};
The following commands create this network:
net = newlin(P,T,[0 1]);
net.biasConnect = 0;
Assign the weight matrix to be
The command is
net.IW{1,1} = [1 2];
You can now simulate the network:
A = sim(net,P)
A =
Data Structures
p
1
1
=,
p
2
2
=,
p
3
3
=,
p
4
4
=
[1] [4] [7] [10]
You input a cell array containing a sequence of inputs, and the network
produces a cell array containing a sequence of outputs. The order of the inputs
is important when they are presented as a sequence. In this case, the current
output is obtained by multiplying the current input by 1 and the preceding
input by 2 and summing the result. If you were to change the order of the
inputs, the numbers obtained in the output would change.
Simulation with Concurrent Inputs in a Dynamic
Network
If you were to apply the same inputs as a set of concurrent inputs instead of a
sequence of inputs, you would obtain a completely different response.
(However, it is not clear why you would want to do this with a dynamic
network.) It would be as if each input were applied concurrently to a separate
parallel network. For the previous example, “Simulation with Sequential
Inputs in a Dynamic Network” on page 2-15, if you use a concurrent set of
inputs you have
which can be created with the following code:
P = [1 2 3 4];
When you simulate with concurrent inputs, you obtain
A = sim(net,P)
A =
1 2 3 4
The result is the same as if you had concurrently applied each one of the inputs
to a separate network and computed one output. Note that because you did not
assign any initial conditions to the network delays, they were assumed to be 0.
For this case the output is simply 1 times the input, because the weight that
multiplies the current input is 1.
In certain special cases, you might want to simulate the network response to
several different sequences at the same time. In this case, you would want to
present the network with a concurrent set of sequences. For example, suppose
you wanted to present the following two sequences to the network:
2-17
2 Neuron Model and Network Architectures
p
1
1()
1
,=p
1
2()
2
,=p
1
3()
3
,=p
1
4()4=
p
2
1()
4
,=p
2
2()
3
,=p
2
3()
2
,=p
2
4()1=
p11()p21()…pQ1(),,,[]p12()p22()…pQ2(),,,[]
·
… p1TS()p2TS()…pQTS(),,,[],,,{}
First Sequence
Qth Sequence
The input P should be a cell array, where each element of the array contains
the two elements of the two sequences that occur at the same time:
P = {[1 4] [2 3] [3 2] [4 1]};
You can now simulate the network:
A = sim(net,P);
The resulting network output would be
A = {[1 4] [4 11] [7 8] [10 5]}
As you can see, the first column of each matrix makes up the output sequence
produced by the first input sequence, which was the one used in an earlier
example. The second column of each matrix makes up the output sequence
produced by the second input sequence. There is no interaction between the
two concurrent sequences. It is as if they were each applied to separate
networks running in parallel.
The following diagram shows the general format for the input
P to the sim
function when there are Q concurrent sequences of TS time steps. It covers all
cases where there is a single input vector. Each element of the cell array is a
matrix of concurrent vectors that correspond to the same point in time for each
sequence. If there are multiple input vectors, there will be multiple rows of
matrices in the cell array.
In this section, you apply sequential and concurrent inputs to dynamic
networks. In “Simulation with Concurrent Inputs in a Static Network” on
page 2-14, you applied concurrent inputs to static networks. It is also possible
2-18
Data Structures
to apply sequential inputs to static networks. It does not change the simulated
response of the network, but it can affect the way in which the network is
trained. This will become clear in “Training Styles” on page 2-20.
2-19
2 Neuron Model and Network Architectures
t2p1p2+=
p
1
1
2
=,
p
2
2
1
=,
p
3
2
3
=,
p
4
3
1
=
t
1
4
=,
t
2
5
=,
t
3
7
=,
t
4
7
=
Training Styles
This section describes two different styles of training. In incremental training
the weights and biases of the network are updated each time an input is
presented to the network. In batch training the weights and biases are only
updated after all the inputs are presented.
Incremental Training (of Adaptive and Other
Networks)
Incremental training can be applied to both static and dynamic networks,
although it is more commonly used with dynamic networks, such as adaptive
filters. This section demonstrates how incremental training is performed on
both static and dynamic networks.
Incremental Training with Static Networks
Consider again the static network used for the first example. You want to train
it incrementally, so that the weights and biases are updated after each input is
presented. In this case you use the function
are presented as sequences.
adapt, and the inputs and targets
2-20
Suppose you want to train the network to create the linear function:
Then for the previous inputs,
the targets would be
For incremental training, you present the inputs and targets as sequences:
P = {[1;2] [2;1] [2;3] [3;1]};
T = {4 5 7 7};
First, set up the network with zero initial weights and biases. Also, set the
initial learning rate to zero to show the effect of incremental training.
Training Styles
net = newlin(P,T,0,0);
net.IW{1,1} = [0 0];
net.b{1} = 0;
Recall from “Simulation with Concurrent Inputs in a Static Network” on
page 2-14 that, for a static network, the simulation of the network produces the
same outputs whether the inputs are presented as a matrix of concurrent
vectors or as a cell array of sequential vectors. However, this is not true when
training the network. When you use the
adapt function, if the inputs are
presented as a cell array of sequential vectors, then the weights are updated as
each input is presented (incremental mode). As shown in the next section, if the
inputs are presented as a matrix of concurrent vectors, then the weights are
updated only after all inputs are presented (batch mode).
You are now ready to train the network incrementally.
[net,a,e,pf] = adapt(net,P,T);
The network outputs remain zero, because the learning rate is zero, and the
weights are not updated. The errors are equal to the targets:
a = [0] [0] [0] [0]
e = [4] [5] [7] [7]
If you now set the learning rate to 0.1 you can see how the network is adjusted
as each input is presented:
net.inputWeights{1,1}.learnParam.lr=0.1;
net.biases{1,1}.learnParam.lr=0.1;
[net,a,e,pf] = adapt(net,P,T);
a = [0] [2] [6] [5.8]
e = [4] [3] [1] [1.2]
The first output is the same as it was with zero learning rate, because no
update is made until the first input is presented. The second output is different,
because the weights have been updated. The weights continue to be modified
as each error is computed. If the network is capable and the learning rate is set
correctly, the error is eventually driven to zero.
Incremental Training with Dynamic Networks
You can also train dynamic networks incrementally. In fact, this would be the
most common situation.
2-21
2 Neuron Model and Network Architectures
Here are the initial input Pi and the inputs P and targets T as elements of cell
arrays.
Pi = {1};
P = {2 3 4};
T = {3 5 7};
Create a linear network with one delay at the input, as used in a previous
example. Initialize the weights to zero and set the learning rate to 0.1.
You want to train the network to create the current output by summing the
current and the previous inputs. This is the same input sequence you used in
the previous example (using
term in the sequence as the initial condition for the delay. You can now
sequentially train the network using
[net,a,e,pf] = adapt(net,P,T,Pi);
a = [0] [2.4] [7.98]
e = [3] [2.6] [-0.98]
sim) with the exception that you assign the first
adapt.
2-22
The first output is zero, because the weights have not yet been updated. The
weights change at each subsequent time step.
Batch Training
Batch training, in which weights and biases are only updated after all the
inputs and targets are presented, can be applied to both static and dynamic
networks. Both types of networks are discussed in this section.
Batch Training with Static Networks
Batch training can be done using either adapt or train, although train is
generally the best option, because it typically has access to more efficient
training algorithms. Incremental training can only be done with
can only perform batch training.
For batch training of a static network with
adapt, the input vectors must be
placed in one matrix of concurrent vectors.
P = [1 2 2 3; 2 1 3 1];
adapt; train
Training Styles
T = [4 5 7 7];
Begin with the static network used in previous examples. The learning rate is
set to 0.1.
net = newlin(P,T,0,0.1);
net.IW{1,1} = [0 0];
net.b{1} = 0;
When you call adapt, it invokes trains (the default adaptation function for the
linear network) and
biases).
trains uses Widrow-Hoff learning.
[net,a,e,pf] = adapt(net,P,T);
a = 0 0 0 0
e = 4 5 7 7
learnwh (the default learning function for the weights and
Note that the outputs of the network are all zero, because the weights are not
updated until all the training set has been presented. If you display the
weights, you find
»net.IW{1,1}
ans = 4.9000 4.1000
»net.b{1}
ans =
2.3000
This is different from the result after one pass of adapt with incremental
updating.
Now perform the same batch training using
rule can be used in incremental or batch mode, it can be invoked by
train. (There are several algorithms that can only be used in batch mode (e.g.,
Levenberg-Marquardt), so these algorithms can only be invoked by
train. Because the Widrow-Hoff
adapt or
train.)
For this case, the input vectors can be in a matrix of concurrent vectors or in a
cell array of sequential vectors. Because the network is static and because
train always operates in batch mode, train converts any cell array of
sequential vectors to a matrix of concurrent vectors. Concurrent mode
operation is used whenever possible because it has a more efficient
implementation in MATLAB
P = [1 2 2 3; 2 1 3 1];
T = [4 5 7 7];
®
code:
2-23
2 Neuron Model and Network Architectures
The network is set up in the same way.
net = newlin(P,T,0,0.1);
net.IW{1,1} = [0 0];
net.b{1} = 0;
Now you are ready to train the network. Train it for only one epoch, because
you used only one pass of
network is
is
learnwh, so you should get the same results obtained using adapt in the
previous example, where the default adaptation function was
If you display the weights after one epoch of training, you find
»net.IW{1,1}
»net.b{1}
2.3000
trainb, and the default learning function for the weights and biases
ans = 4.9000 4.1000
ans =
adapt. The default training function for the linear
trains.
2-24
This is the same result as the batch mode training in adapt. With static
networks, the
adapt function can implement incremental or batch training,
depending on the format of the input data. If the data is presented as a matrix
of concurrent vectors, batch training occurs. If the data is presented as a
sequence, incremental training occurs. This is not true for
train, which always
performs batch training, regardless of the format of the input.
Batch Training with Dynamic Networks
Training static networks is relatively straightforward. If you use train the
network is trained in batch mode and the inputs are converted to concurrent
vectors (columns of a matrix), even if they are originally passed as a sequence
(elements of a cell array). If you use
the method of training. If the inputs are passed as a sequence, then the
network is trained in incremental mode. If the inputs are passed as concurrent
vectors, then batch mode training is used.
With dynamic networks, batch mode training is typically done with
especially if only one training sequence exists. To illustrate this, consider again
adapt, the format of the input determines
train only,
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.