Mathworks NEURAL NETWORK TOOLBOX 6 user guide

Neural Network Toolbox™ 6
User’s Guide
Howard Demuth Mark Beale Martin Hagan
How to Contact The MathWorks
www.mathworks.com Web comp.soft-sys.matlab Newsgroup www.mathworks.com/contact_TS.html Technical support
suggest@mathworks.com
Product enhancement suggestions
bugs@mathworks.com Bug reports doc@mathworks.com Documentation error reports service@mathworks.com Order status, license renewals, passcodes info@mathworks.com Sales, pricing, and general information
508-647-7000 (Phone)
508-647-7001 (Fax)
The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098
For contact information about worldwide offices, see the MathWorks Web site.
Neural Network Toolbox™ User’s Guide
© COPYRIGHT 1992–2010 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or repro­duced in any form without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
Patents
The MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.
Revision History
June 1992 First printing April 1993 Second printing January 1997 Third printing July 1997 Fourth printing January 1998 Fifth printing Revised for Version 3 (Release 11) September 2000 Sixth printing Revised for Version 4 (Release 12) June 2001 Seventh printing Minor revisions (Release 12.1) July 2002 Online only Minor revisions (Release 13) January 2003 Online only Minor revisions (Release 13SP1) June 2004 Online only Revised for Version 4.0.3 (Release 14) October 2004 Online only Revised for Version 4.0.4 (Release 14SP1) October 2004 Eighth printing Revised for Version 4.0.4 March 2005 Online only Revised for Version 4.0.5 (Release 14SP2) March 2006 Online only Revised for Version 5.0 (Release 2006a) September 2006 Ninth printing Minor revisions (Release 2006b) March 2007 Online only Minor revisions (Release 2007a) September 2007 Online only Revised for Version 5.1 (Release 2007b) March 2008 Online only Revised for Version 6.0 (Release 2008a) October 2008 Online only Revised for Version 6.0.1 (Release 2008b) March 2009 Online only Revised for Version 6.0.2 (Release 2009a) September 2009 Online only Revised for Version 6.0.3 (Release 2009b) March 2010 Online only Revised for Version 6.0.4 (Release 2010a)
Acknowledgments
The authors would like to thank the following people:
Joe Hicklin of The MathWorks™ for getting Howard into neural network
research years ago at the University of Idaho, for encouraging Howard and Mark to write the toolbox, for providing crucial help in getting the first toolbox Version 1.0 out the door, for continuing to help with the toolbox in many ways, and for being such a good friend.
Roy Lurie of The MathWorks for his continued enthusiasm for the possibilities
for Neural Network Toolbox™ software.
Mary Ann Freeman for general support and for her leadership of a great team of people we enjoy working with.
Rakesh Kumar for cheerfully providing technical and practical help, encouragement, ideas and always going the extra mile for us.
Alan LaFleur for facilitating our documentation work.
Tara Scott and Stephen Vanreusel for help with testing.
Orlando De Jesús of Oklahoma State University for his excellent work in
developing and programming the dynamic training algorithms described in Chapter 6, “Dynamic Networks,” and in programming the neural network controllers described in Chapter 7, “Control Systems.”
Martin Hagan, Howard Demuth, and Mark Beale for permission to include
various problems, demonstrations, and other material from Neural Network Design, January, 1996.
Neural Network Toolbox™ Design Book
The developers of the Neural Network Toolbox™ software have written a textbook, Neural Network Design (Hagan, Demuth, and Beale, ISBN 0-9717321-0-8). The book presents the theory of neural networks, discusses their design and application, and makes considerable use of the MATLAB environment and Neural Network Toolbox software. Demonstration programs from the book are used in various chapters of this user’s guide. (You can find all the book demonstration programs in the Neural Network Toolbox software by typing
This book can be obtained from John Stovall at (303) 492-3648, or by e-mail at
John.Stovall@colorado.edu.
The Neural Network Design textbook includes:
An Instructor’s Manual for those who adopt the book for a class
Transparency Masters for class use
If you are teaching a class and want an Instructor’s Manual (with solutions to the book exercises), contact John Stovall at (303) 492-3648, or by e-mail at
John.Stovall@colorado.edu.
To look at sample chapters of the book and to obtain Transparency Masters, go directly to the Neural Network Design page at
http://hagan.okstate.edu/nnd.html
From this link, you can obtain sample book chapters in PDF format and you can download the Transparency Masters by clicking Transparency Masters (3.6MB).
nnd.)
®
You can get the Transparency Masters in PowerPoint or PDF format.
Getting Started
1
Product Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Using the Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Applications for Neural Network Toolbox™ Software . . . . 1-4
Applications in This Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
Business Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
Fitting a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Using the Neural Network Fitting Tool GUI . . . . . . . . . . . . . . 1-13
Recognizing Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . . . 1-25
Using the Neural Network Pattern
Recognition Tool GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
1-31
Clustering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-42
Defining a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-42
Using Command-Line Functions . . . . . . . . . . . . . . . . . . . . . . . 1-43
Using the Neural Network Clustering Tool GUI . . . . . . . . . . . 1-47
Neuron Model and Network Architectures
2
Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Simple Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Transfer Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Neuron with Vector Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
i
Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
A Layer of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Multiple Layers of Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Input and Output Processing Functions . . . . . . . . . . . . . . . . . . 2-12
Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Simulation with Concurrent Inputs in a Static Network . . . . 2-14
Simulation with Sequential Inputs in a Dynamic Network . . 2-15 Simulation with Concurrent Inputs in a Dynamic Network . . 2-17
Training Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Incremental Training (of Adaptive and Other Networks) . . . . 2-20
Batch Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Training Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Perceptrons
3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Important Perceptron Functions . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
ii Contents
Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Perceptron Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Creating a Perceptron (newp) . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Simulation (sim) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Initialization (init) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
Learning Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
Perceptron Learning Rule (learnp) . . . . . . . . . . . . . . . . . . . . 3-12
Training (train) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
Outliers and the Normalized Perceptron Rule . . . . . . . . . . . . . 3-21
Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Introduction to the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Create a Perceptron Network (nntool) . . . . . . . . . . . . . . . . . . . 3-23
Train the Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27
Export Perceptron Results to the Workspace . . . . . . . . . . . . . . 3-29
Clear the Network/Data Window . . . . . . . . . . . . . . . . . . . . . . . 3-30
Importing from the Command Line . . . . . . . . . . . . . . . . . . . . . 3-30
Save a Variable to a File and Load It Later . . . . . . . . . . . . . . . 3-31
Linear Filters
4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Creating a Linear Neuron (newlin) . . . . . . . . . . . . . . . . . . . . . . . 4-4
Least Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
Linear System Design (newlind) . . . . . . . . . . . . . . . . . . . . . . . . 4-9
Linear Networks with Delays . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Tapped Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Linear Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
LMS Algorithm (learnwh) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
Linear Classification (train) . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
Overdetermined Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
Underdetermined Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
iii
Linearly Dependent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
Too Large a Learning Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Backpropagation
5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Solving a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Improving Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Under the Hood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Feedforward Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10
Simulation (sim) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
iv Contents
Faster Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
Variable Learning Rate (traingda, traingdx) . . . . . . . . . . . . . . 5-19
Resilient Backpropagation (trainrp) . . . . . . . . . . . . . . . . . . . . . 5-21
Conjugate Gradient Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Line Search Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26
Quasi-Newton Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29
Levenberg-Marquardt (trainlm) . . . . . . . . . . . . . . . . . . . . . . . . 5-30
Reduced Memory Levenberg-Marquardt (trainlm) . . . . . . . . . 5-32
Speed and Memory Comparison . . . . . . . . . . . . . . . . . . . . . . . 5-34
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-50
Improving Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52
Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-53
Index Data Division (divideind) . . . . . . . . . . . . . . . . . . . . . . . . 5-54
Random Data Division (dividerand) . . . . . . . . . . . . . . . . . . . . . 5-54
Block Data Division (divideblock) . . . . . . . . . . . . . . . . . . . . . . . 5-54
Interleaved Data Division (divideint) . . . . . . . . . . . . . . . . . . . . 5-55
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-55
Summary and Discussion of Early Stopping
and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preprocessing and Postprocessing . . . . . . . . . . . . . . . . . . . . . 5-61
Min and Max (mapminmax) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-62
Mean and Stand. Dev. (mapstd) . . . . . . . . . . . . . . . . . . . . . . . . 5-63
Principal Component Analysis (processpca) . . . . . . . . . . . . . . . 5-64
Processing Unknown Inputs (fixunknowns) . . . . . . . . . . . . . . . 5-65
Representing Unknown or Don’t Care Targets . . . . . . . . . . . . 5-66
Posttraining Analysis (postreg) . . . . . . . . . . . . . . . . . . . . . . . . . 5-66
Sample Training Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
Limitations and Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-71
Dynamic Networks
6
5-58
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Examples of Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Applications of Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . 6-7
Dynamic Network Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
Dynamic Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
Focused Time-Delay Neural Network (newfftd) . . . . . . . . . 6-11
Distributed Time-Delay Neural Network (newdtdnn) . . . . 6-15
NARX Network (newnarx, newnarxsp, sp2narx) . . . . . . . . 6-18
Layer-Recurrent Network (newlrn) . . . . . . . . . . . . . . . . . . . . 6-24
v
Control Systems
7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
NN Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Predictive Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Using the NN Predictive Controller Block . . . . . . . . . . . . . . . . . 7-6
NARMA-L2 (Feedback Linearization) Control . . . . . . . . . . 7-14
Identification of the NARMA-L2 Model . . . . . . . . . . . . . . . . . . 7-14
NARMA-L2 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
Using the NARMA-L2 Controller Block . . . . . . . . . . . . . . . . . . 7-18
Model Reference Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23
Using the Model Reference Controller Block . . . . . . . . . . . . . . 7-25
Importing and Exporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31
Importing and Exporting Networks . . . . . . . . . . . . . . . . . . . . . 7-31
Importing and Exporting Training Data . . . . . . . . . . . . . . . . . 7-35
vi Contents
Radial Basis Networks
8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Important Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 8-2
Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
Exact Design (newrbe) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
More Efficient Design (newrb) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7
Demonstrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
Probabilistic Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 8-9
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9
Design (newpnn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10
Generalized Regression Networks . . . . . . . . . . . . . . . . . . . . . 8-12
Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
Design (newgrnn) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14
Self-Organizing and Learning
Vector Quantization Nets
9
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Important Self-Organizing and LVQ Functions . . . . . . . . . . . . . 9-2
Competitive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
Creating a Competitive Neural Network (newc) . . . . . . . . . . . . 9-4
Kohonen Learning Rule (learnk) . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Bias Learning Rule (learncon) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
Graphical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
Self-Organizing Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
Topologies (gridtop, hextop, randtop) . . . . . . . . . . . . . . . . . . . . 9-10
Distance Functions (dist, linkdist, mandist, boxdist) . . . . . . . 9-14
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
Creating a Self-Organizing MAP Neural Network (newsom) . 9-18
Training (learnsomb) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Learning Vector Quantization Networks . . . . . . . . . . . . . . . 9-35
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-35
Creating an LVQ Network (newlvq) . . . . . . . . . . . . . . . . . . . . . 9-36
LVQ1 Learning Rule (learnlv1) . . . . . . . . . . . . . . . . . . . . . . . . . 9-39
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-40
Supplemental LVQ2.1 Learning Rule (learnlv2) . . . . . . . . . . . 9-42
vii
10
Adaptive Filters and Adaptive Training
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Important Adaptive Functions . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Linear Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
Adaptive Linear Network Architecture . . . . . . . . . . . . . . . . 10-4
Single ADALINE (newlin) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Least Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
LMS Algorithm (learnwh) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
Adaptive Filtering (adapt) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
Tapped Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
Adaptive Filter Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
Prediction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Noise Cancellation Example . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
Multiple Neuron Adaptive Filters . . . . . . . . . . . . . . . . . . . . . . 10-16
viii Contents
Applications
11
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Application Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Applin1: Linear Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4
Network Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4
Thoughts and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6
Applin2: Adaptive Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
12
Network Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Network Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Thoughts and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10
Appelm1: Amplitude Detection . . . . . . . . . . . . . . . . . . . . . . . 11-11
Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
Network Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Network Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Network Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
Improving Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
Appcr1: Character Recognition . . . . . . . . . . . . . . . . . . . . . . . 11-15
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16
System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-18
Advanced Topics
13
Custom Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Custom Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Network Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Network Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12
Additional Toolbox Functions . . . . . . . . . . . . . . . . . . . . . . . . 12-15
Custom Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16
Historical Networks
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
ix
14
Important Recurrent Network Functions . . . . . . . . . . . . . . . . . 13-2
Elman Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Creating an Elman Network (newelm) . . . . . . . . . . . . . . . . . . . 13-4
Training an Elman Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5
Hopfield Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
Design (newhop) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-10
Network Object Reference
Network Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
Subobject Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-7
Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9
Weight and Bias Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
x Contents
Subobject Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-20
Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-21
Input Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22
Layer Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24
15
Function Reference
Analysis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Distance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4
Graphical Interface Functions . . . . . . . . . . . . . . . . . . . . . . . . 15-5
Layer Initialization Functions . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Learning Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7
Line Search Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8
Net Input Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-9
Network Initialization Function . . . . . . . . . . . . . . . . . . . . . . 15-10
Network Use Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
New Networks Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Performance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13
Plotting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-14
Processing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15
Simulink® Support Function . . . . . . . . . . . . . . . . . . . . . . . . . 15-16
Topology Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
Training Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-18
Transfer Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-19
Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-20
xi
16
A
Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-21
Weight and Bias Initialization Functions . . . . . . . . . . . . . . 15-22
Weight Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-23
Transfer Function Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
Functions — Alphabetical List
Mathematical Notation
Mathematical Notation for Equations and Figures . . . . . . . A-2
Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Weight Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Bias Elements and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Time and Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Layer Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Figure and Equation Examples . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
xii Contents
Mathematics and Code Equivalents . . . . . . . . . . . . . . . . . . . . . A-4
Demonstrations and Applications
B
Tables of Demonstrations and Applications . . . . . . . . . . . . . B-2
Chapter 2, “Neuron Model and Network Architectures” . . . . . . B-2
Chapter 3, “Perceptrons” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Chapter 4, “Linear Filters” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
C
Chapter 5, “Backpropagation” . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Chapter 8, “Radial Basis Networks” . . . . . . . . . . . . . . . . . . . . . B-4
Chapter 9, “Self-Organizing and Learning
Vector Quantization Nets” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 10, “Adaptive Filters and Adaptive Training” . . . . . . . B-4
Chapter 11, “Applications” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
Chapter 13, “Historical Networks” . . . . . . . . . . . . . . . . . . . . . . . B-5
B-4
Blocks for the Simulink® Environment
Blockset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Transfer Function Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Net Input Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
Weight Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
Processing Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-4
Block Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-7
D
Code Notes
Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3
Utility Function Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-6
Code Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
Argument Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8
xiii
E
Bibliography
Glossary
Index
xiv Contents

Getting Started

Product Overview (p. 1-2)
Using the Documentation (p. 1-3)
Applications for Neural Network Toolbox™ Software (p. 1-4)
Fitting a Function (p. 1-7)
Recognizing Patterns (p. 1-24)
Clustering Data (p. 1-42)
1
1 Getting Started
Neural Network including connections (called weights) between neurons
Input Output
Target
Adjust weights
Compare

Product Overview

Neural networks are composed of simple elements operating in parallel. These elements are inspired by biological nervous systems. As in nature, the connections between elements largely determine the network function. You can train a neural network to perform a particular function by adjusting the values of the connections (weights) between elements.
Typically, neural networks are adjusted, or trained, so that a particular input leads to a specific target output. The next figure illustrates such a situation. There, the network is adjusted, based on a comparison of the output and the target, until the network output matches the target. Typically, many such input/target pairs are needed to train a network.
Neural networks have been trained to perform complex functions in various fields, including pattern recognition, identification, classification, speech, vision, and control systems.
Neural networks can also be trained to solve problems that are difficult for conventional computers or human beings. The toolbox emphasizes the use of neural network paradigms that build up to—or are themselves used in— engineering, financial, and other practical applications.
The next sections explain how to use three graphical tools for training neural networks to solve problems in function fitting, pattern recognition, and clustering.
1-2

Using the Documentation

The neuron model and the architecture of a neural network describe how a network transforms its input into an output. This transformation can be viewed as a computation.
This first chapter gives you an overview of the Neural Network Toolbox™ product and introduces you to the following tasks:
Training a neural network to fit a function
Training a neural network to recognize patterns
Training a neural network to cluster data
These next two chapters explain the computations that are done and pave the way for an understanding of training methods for the networks. You should read them before advancing to later topics:
Chapter 2, “Neuron Model and Network Architectures,” presents the fundamentals of the neuron model, the architectures of neural networks. It also discusses the notation used in this toolbox.
Chapter 3, “Perceptrons,” explains how to create and train simple networks. It also introduces a graphical user interface (GUI) that you can use to solve problems without a lot of coding.
Using the Documentation
1-3
1 Getting Started

Applications for Neural Network Toolbox™ Software

Applications in This Toolbox

Chapter 7, “Control Systems” describes three practical neural network control system applications, including neural network model predictive control, model reference adaptive control, and a feedback linearization controller.
Chapter 11, “Applications” describes other neural network applications.

Business Applications

The 1988 DARPA Neural Network Study [DARP88] lists various neural network applications, beginning in about 1984 with the adaptive channel equalizer. This device, which is an outstanding commercial success, is a single­neuron network used in long-distance telephone systems to stabilize voice signals. The DARPA report goes on to list other commercial applications, including a small word recognizer, a process monitor, a sonar classifier, and a risk analysis system.
Neural networks have been applied in many other fields since the DARPA report was written, as described in the next table.
1-4
Industry Business Applications
Aerospace High-performance aircraft autopilot, flight path
simulation, aircraft control systems, autopilot enhancements, aircraft component simulation, and aircraft component fault detection
Automotive Automobile automatic guidance system, and
warranty activity analysis
Banking Check and other document reading and credit
application evaluation
Applications for Neural Network Toolbox™ Software
Industry Business Applications
Defense Weapon steering, target tracking, object
discrimination, facial recognition, new kinds of sensors, sonar, radar and image signal processing including data compression, feature extraction and noise suppression, and signal/image identification
Electronics Code sequence prediction, integrated circuit chip
layout, process control, chip failure analysis, machine vision, voice synthesis, and nonlinear modeling
Entertainment Animation, special effects, and market forecasting
Financial Real estate appraisal, loan advising, mortgage
screening, corporate bond rating, credit-line use analysis, credit card activity tracking, portfolio trading program, corporate financial analysis, and currency price prediction
Industrial Prediction of industrial processes, such as the
output gases of furnaces, replacing complex and costly equipment used for this purpose in the past
Insurance Policy application evaluation and product
optimization
Manufacturing Manufacturing process control, product design
and analysis, process and machine diagnosis, real-time particle identification, visual quality inspection systems, beer testing, welding quality analysis, paper quality prediction, computer-chip quality analysis, analysis of grinding operations, chemical product design analysis, machine maintenance analysis, project bidding, planning and management, and dynamic modeling of chemical process system
1-5
1 Getting Started
Industry Business Applications
Medical Breast cancer cell analysis, EEG and ECG
analysis, prosthesis design, optimization of transplant times, hospital expense reduction, hospital quality improvement, and emergency-room test advisement
Oil and gas Exploration
Robotics Trajectory control, forklift robot, manipulator
controllers, and vision systems
Speech Speech recognition, speech compression, vowel
classification, and text-to-speech synthesis
Securities Market analysis, automatic bond rating, and
stock trading advisory systems
Telecommunications Image and data compression, automated
information services, real-time translation of spoken language, and customer payment processing systems
1-6
Transportation Truck brake diagnosis systems, vehicle
scheduling, and routing systems

Fitting a Function

Neural networks are good at fitting functions and recognizing patterns. In fact, there is proof that a fairly simple neural network can fit any practical function.
Suppose, for instance, that you have data from a housing application [HaRu78]. You want to design a network that can predict the value of a house (in $1000s), given 13 pieces of geographical and real estate information. You have a total of 506 example homes for which you have those 13 items of data and their associated market values.
You can solve this problem in three ways:
Use a command-line function, as described in “Using Command-Line Functions” on page 1-7.
Use a graphical user interface, Network Fitting Tool GUI” on page 1-13.
Use

Defining a Problem

To define a fitting problem for the toolbox, arrange a set of Q input vectors as columns in a matrix. Then, arrange another set of Q target vectors (the correct output vectors for each of the input vectors) into a second matrix. For example, you can define the fitting problem for a Boolean AND gate with four sets of two-element input vectors and one-element targets as follows:
Fitting a Function
nftool, as described in “Using the Neural
nntool, as described in “Graphical User Interface” on page 3-23.
inputs = [0 1 0 1; 0 0 1 1]; targets = [0 0 0 1];
The next section demonstrates how to train a network from the command line, after you have defined the problem. This example uses the housing data set provided with the toolbox.

Using Command-Line Functions

1 Load the data, consisting of input vectors and target vectors, as follows:
load house_dataset
2 Create a network. For this example, you use a feed-forward network with
the default tan-sigmoid transfer function in the hidden layer and linear
1-7
1 Getting Started
transfer function in the output layer. This structure is useful for function approximation (or regression) problems. Use 20 neurons (somewhat arbitrary) in one hidden layer. The network has one output neuron, because there is only one target value associated with each input vector.
net = newfit(houseInputs,houseTargets,20);
Note More neurons require more computation, but they allow the network to solve more complicated problems. More layers require more computation, but their use might result in the network solving complex problems more efficiently.
3 Train the network. The network uses the default Levenberg-Marquardt
algorithm for training. The application randomly divides input vectors and target vectors into three sets as follows:
- 60% are used for training.
- 20% are used to validate that the network is generalizing and to stop
training before overfitting.
- The last 20% are used as a completely independent test of network
generalization.
1-8
To train the network, enter:
net=train(net,houseInputs,houseTargets);
During training, the following training window opens. This window displays training progress and allows you to interrupt training at any point by clicking
Stop Training.
Fitting a Function
This example used the train function. All the input vectors to the network appear at once in a batch. Alternatively, you can present the input vectors one at a time using the
adapt function. “Training Styles” on page 2-20
describes the two training approaches.
This training stopped when the validation error increased for six iterations, which occurred at iteration 23. If you click
Performance in the training
window, a plot of the training errors, validation errors, and test errors appears, as shown in the following figure. In this example, the result is reasonable because of the following considerations:
1-9
1 Getting Started
- The final mean-square error is small.
- The test set error and the validation set error have similar characteristics.
- No significant overfitting has occurred by iteration 17 (where the best
validation performance occurs).
1-10
4 Perform some analysis of the network response. If you click Regression in
the training window, you can perform a linear regression between the network outputs and the corresponding targets.
The following figure shows the results.
Fitting a Function
The output tracks the targets very well for training, testing, and validation, and the R-value is over 0.95 for the total response. If even more accurate results were required, you could try any of these approaches:
Reset the initial network weights and biases to new values with
init and
train again.
Increase the number of hidden neurons.
Increase the number of training vectors.
1-11
1 Getting Started
Increase the number of input values, if more relevant information is available.
Try a different training algorithm (see “Speed and Memory Comparison” on page 5-34).
In this case, the network response is satisfactory, and you can now use put the network to use on new inputs.
To get more experience in command-line operations, try some of these tasks:
During training, open a plot window (such as the regression plot), and watch it animate.
Plot from the command line with functions such as
plotregression, plottrainstate and plotperform. (For more information
on using these functions, see their reference pages.)
plotfit,
sim to
1-12

Using the Neural Network Fitting Tool GUI

1 Open the Neural Network Fitting Tool with this command:
nftool
Fitting a Function
1-13
1 Getting Started
2 Click Next to proceed.
1-14
3 Click Load Example Data Set in the Select Data window. The Fitting Data
Set Chooser window opens.
Note You use the Inputs and Targets options in the Select Data window when you need to load data from the MATLAB® workspace.
Fitting a Function
4 Select Simple Fitting Problem, and click Import. This brings you back to
the Select Data window.
1-15
1 Getting Started
5 Click Next to display the Validate and Test Data window, shown in the
following figure.
The validation and test data sets are each set to 15% of the original data.
1-16
6 Click Next.
Fitting a Function
The number of hidden neurons is set to
20. You can change this value in
another run if you want. You might want to change this number if the network does not perform as well as you expect.
1-17
1 Getting Started
7 Click Next.
1-18
8 Click Train.
This time the training continued for the maximum of 1000 iterations.
Fitting a Function
9 Under Plots, click Regression.
For this simple fitting problem, the fit is almost perfect for training, testing, and validation data.
1-19
1 Getting Started
1-20
These plots are the regression plots for the output with respect to training, validation, and test data.
10 View the network response. For single-input/single-output problems, like
this simple fitting problem, under the
Plots pane, click Fit.
Fitting a Function
The blue symbols represent training data, the green symbols represent validation data, and the red symbols represent testing data. For this problem and this network, the network outputs match the targets for all three data sets.
11 Click Next in the Neural Network Fitting Tool to evaluate the network.
1-21
1 Getting Started
1-22
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new data, you can take any of the following steps:
- Train it again.
- Increase the number of neurons.
- Get a larger training data set.
12 If you are satisfied with the network performance, click Next.
13 Use the buttons on this screen to save your results.
Fitting a Function
- You have the network saved as
net1 in the workspace. You can perform
additional tests on it or put it to work on new inputs, using the function.
- You can also click
Generate M-File to create an M-file that can be used to
reproduce all of the previous steps from the command line. Creating an M-file can be helpful if you want to learn how to use the command-line functionality of the toolbox to customize the training process.
14 When you have saved your results, click Finish.
sim
1-23
1 Getting Started

Recognizing Patterns

In addition to function fitting, neural networks are also good at recognizing patterns.
For example, suppose you want to classify a tumor as benign or malignant, based on uniformity of cell size, clump thickness, mitosis, etc. [MuAh94]. You have 699 example cases for which you have 9 items of data and the correct classification as benign or malignant.
As with function fitting, there are three ways to solve this problem:
Use a command-line solution, as described in “Using Command-Line
Use the
Use

Defining a Problem

To define a pattern recognition problem, arrange a set of Q input vectors as columns in a matrix. Then arrange another set of Q target vectors so that they indicate the classes to which the input vectors are assigned. There are two approaches to creating the target vectors.
Functions” on page 1-43.
nprtool GUI, as described in “Using the Neural Network Clustering
Tool GUI” on page 1-47.
nntool, as described in “Graphical User Interface” on page 3-23.
1-24
One approach can be used when there are only two classes; you set each scalar target value to either 1 or 0, indicating which class the corresponding input belongs to. For instance, you can define the exclusive-or classification problem as follows:
inputs = [0 1 0 1; 0 0 1 1]; targets = [0 1 0 1];
Alternately, target vectors can have N elements, where for each target vector, one element is 1 and the others are 0. This defines a problem where inputs are to be classified into N different classes. For example, the following lines show how to define a classification problem that divides the corners of a 5-by-5-by-5 cube into three classes:
The origin (the first input vector) in one class
The corner farthest from the origin (the last input vector) in a second class
All other points in a third class
Recognizing Patterns
inputs = [0 0 0 0 5 5 5 5; 0 0 5 5 0 0 5 5; 0 5 0 5 0 5 0 5]; targets = [1 0 0 0 0 0 0 0; 0 1 1 1 1 1 1 0; 0 0 0 0 0 0 0 1];
Classification problems involving only two classes can be represented using either format. The targets can consist of either scalar 1/0 elements or two-element vectors, with one element being 1 and the other element being 0.
The next section demonstrates how to train a network from the command line, after you have defined the problem.

Using Command-Line Functions

1 Use the cancer data set as an example. This data set consists of 699
nine-element input vectors and two-element target vectors.
Load the tumor classification data as follows:
load cancer_dataset
2 Create a network. For this example, you use a pattern recognition network,
which is a feed-forward network with tan-sigmoid transfer functions in both the hidden layer and the output layer. As in the function-fitting example, use 20 neurons in one hidden layer:
- The network has two output neurons, because there are two categories
associated with each input vector.
- Each output neuron represents a category.
- When an input vector of the appropriate category is applied to the
network, the corresponding neuron should produce a 1, and the other neurons should output a 0.
To create a network, enter this command:
net = newpr(cancerInputs,cancerTargets,20);
3 Train the network. The pattern recognition network uses the default Scaled
Conjugate Gradient algorithm for training. The application randomly divides the input vectors and target vectors into three sets:
- 60% are used for training.
- 20% are used to validate that the network is generalizing and to stop
training before overfitting.
1-25
1 Getting Started
- The last 20% are used as a completely independent test of network
generalization.
To train the network, enter this command:
net=train(net,cancerInputs,cancerTargets);
During training, as in function fitting, the training window opens. This window displays training progress. To interrupt training at any point, click
Stop Training.
1-26
Recognizing Patterns
This example uses the train function. It presents all the input vectors to the network at once in a batch. Alternatively, you can present the input vectors one at a time using the
adapt function. “Training Styles” on page 2-20
describes the two training approaches.
This training stopped when the validation error increased for six iterations, which occurred at iteration 15.
4 To find the validation error, click Performance in the training window. A
plot of the training errors, validation errors, and test errors appears, as
1-27
1 Getting Started
shown in the following figure. The best validation performance occurred at iteration 9, and the network at this iteration is returned.
1-28
5 To analyze the network response, click Confusion in the training window.
A display of the confusion matrix appears that shows various types of errors that occurred for the final trained network.
The next figure shows the results.
Recognizing Patterns
The diagonal cells in each table show the number of cases that were correctly classified, and the off-diagonal cells show the misclassified cases. The blue cell in the bottom right shows the total percent of correctly classified cases (in green) and the total percent of misclassified cases (in red). The results for all three data sets (training, validation, and testing) show very good recognition. If you needed even more accurate results, you could try any of the following approaches:
1-29
1 Getting Started
Reset the initial network weights and biases to new values with init and train again.
Increase the number of hidden neurons.
Increase the number of training vectors.
Increase the number of input values, if more relevant information is
available.
Try a different training algorithm (see “Speed and Memory Comparison” on page 5-34).
In this case, the network response is satisfactory, and you can now use put the network to use on new inputs.
To get more experience in command-line operations, here are some tasks you can try:
During training, open a plot window (such as the confusion plot), and watch it animate.
Plot from the command line with functions such as
plottrainstate, and plotperform. (For more information on using these
functions, see their reference pages.)
plotconfusion, plotroc,
sim to
1-30
Recognizing Patterns

Using the Neural Network Pattern Recognition Tool GUI

1 Open the Neural Network Pattern Recognition Tool window with this
command:
nprtool
1-31
1 Getting Started
2 Click Next to proceed. The Select Data window opens.
1-32
3 Click Load Example Data Set. The Pattern Recognition Data Set Chooser
window opens.
Recognizing Patterns
4 In this window, select Simple Classes, and click Import. You return to the
Select Data window.
1-33
1 Getting Started
5 Click Next to continue to the Validate and Test Data window, shown in the
following figure.
Validation and test data sets are each set to 15% of the original data.
1-34
6 Click Next.
Recognizing Patterns
The number of hidden neurons is set to
20. You can change this in another
run if you want. You might want to change this number if the network does not perform as well as you expect.
1-35
1 Getting Started
7 Click Next.
1-36
8 Click Train.
Recognizing Patterns
The training continues for 55 iterations.
9 Under the Plots pane, click Confusion in the Neural Network Pattern
Recognition Tool.
The next figure shows the confusion matrices for training, testing, and validation, and the three kinds of data combined. The network's outputs are almost perfect, as you can see by the high numbers of correct responses in
1-37
1 Getting Started
the green squares and the low numbers of incorrect responses in the red squares. The lower right blue squares illustrate the overall accuracies.
1-38
10 Plot the Receiver Operating Characteristic (ROC) curve. Under the Plots
pane, click Receiver Operating Characteristic in the Neural Network Pattern Recognition Tool.
Recognizing Patterns
The colored lines in each axis represent the ROC curves for each of the four categories of this simple test problem. The
ROC curve is a plot of the true
positive rate (sensitivity) versus the false positive rate (1 - specificity) as the threshold is varied. A perfect test would show points in the upper-left corner, with 100% sensitivity and 100% specificity. For this simple problem, the network performs almost perfectly.
1-39
1 Getting Started
11 In the Neural Network Pattern Recognition Tool, click Next to evaluate the
network.
1-40
At this point, you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new data, you can train it again, increase the number of neurons, or perhaps get a larger training data set.
Recognizing Patterns
12 When you are satisfied with the network performance, click Next.
13 Use the buttons on this screen to save your results.
- You now have the network saved as
net1 in the workspace. You can
perform additional tests on it or put it to work on new inputs using the function.
- If you click
Generate M-File, the tool creates an M-file, with commands
that recreate the steps that you have just performed from the command line. Generating an M-file is a good way to learn how to use the command-line operations of the Neural Network Toolbox™ software.
14 When you have saved your results, click Finish.
sim
1-41
1 Getting Started

Clustering Data

Clustering data is another excellent application for neural networks. This process involves grouping data by similarity. For example, you might perform:
Market segmentation by grouping people according to their buying patterns
Data mining by partitioning data into related subsets
Bioinformatic analysis by grouping genes with related expression patterns
Suppose that you want to cluster flower types according to petal length, petal width, sepal length, and sepal width [MuAh94]. You have 150 example cases for which you have these four measurements.
As with function fitting and pattern recognition, there are three ways to solve this problem:
Use a command-line solution, as described in “Using Command-Line Functions” on page 1-43.
Use the nctool GUI, as described in “Using the Neural Network Clustering Tool GUI” on page 1-47.
Use
nntool, as described in “Graphical User Interface” on page 3-23.
1-42

Defining a Problem

To define a clustering problem, simply arrange Q input vectors to be clustered as columns in an input matrix. For instance, you might want to cluster this set of 10 two-element vectors:
inputs = [7 0 6 2 6 5 6 1 0 1; 6 2 5 0 7 5 5 1 2 2]
The next section demonstrates how to train a network from the command line, after you have defined the problem.
Clustering Data

Using Command-Line Functions

1 Use the flower data set as an example. The iris data set consists of 150
four-element input vectors.
Load the data as follows:
load iris_dataset
This data set consists of input vectors and target vectors. However, you only need the input vectors for clustering.
2 Create a network. For this example, you use a self-organizing map (SOM).
This network has one layer, with the neurons organized in a grid. (For more information, see “Self-Organizing Feature Maps” on page 9-9.) When creating the network, you specify the number of rows and columns in the grid:
net = newsom(irisInputs,[6,6]);
3 Train the network. The SOM network uses the default batch SOM algorithm
for training.
net=train(net,irisInputs);
4 During training, the training window opens and displays the training
progress. To interrupt training at any point, click
Stop Training.
1-43
1 Getting Started
1-44
5 For SOM training, the weight vector associated with each neuron moves to
become the center of a cluster of input vectors. In addition, neurons that are adjacent to each other in the topology should also move close to each other in the input space. The default topology is hexagonal; to view it, click
Topology
from the network training window.
SOM
Clustering Data
In this figure, each of the hexagons represents a neuron. The grid is 6-by-6, so there are a total of 36 neurons in this network. There are four elements in each input vector, so the input space is four-dimensional. The weight vectors (cluster centers) fall within this space.
Because this SOM has a two-dimensional topology, you can visualize in two dimensions the relationships among the four-dimensional cluster centers. One visualization tool for the SOM is the the
U-matrix).
6 To view the U-matrix, click SOM Neighbor Distances in the training
weight distance matrix (also called
window.
1-45
1 Getting Started
1-46
In this figure, the blue hexagons represent the neurons. The red lines connect neighboring neurons. The colors in the regions containing the red lines indicate the distances between neurons. The darker colors represent larger distances, and the lighter colors represent smaller distances.
A band of dark segments crosses from the lower-center region to the upper-right region. The SOM network appears to have clustered the flowers into two distinct groups.
To get more experience in command-line operations, try some of these tasks:
During training, open a plot window (such as the SOM weight position plot) and watch it animate
Plot from the command line with functions such as
plotsomnd, plotsomplanes, plotsompos, and plotsomtop. (For more
information on using these functions, see their reference pages.)
plotsomhits, plotsomnc,
Clustering Data

Using the Neural Network Clustering Tool GUI

1 Open the Neural Network Clustering Tool window with this command:
nctool
1-47
1 Getting Started
2 Click Next. The Select Data window appears.
1-48
Clustering Data
3 Click Load Example Data Set. The Clustering Data Set Chooser window
appears.
4 In this window, select Simple Clusters, and click Import. You return to the
Select Data window.
1-49
1 Getting Started
5 Click Next to continue to the Network Size window, shown in the following
figure.
The size of the two-dimensional map is set to side of a two-dimensional grid. The total number of neurons is 100. You can change this number in another run if you want.
10. This map represents one
1-50
6 Click Next. The Train Network window appears.
Clustering Data
1-51
1 Getting Started
7 Click Train
1-52
The training runs for the maximum number of epochs, which is 200.
Clustering Data
8 Investigate some of the visualization tools for the SOM. Under the Plots
pane, click SOM Sample Hits.
This figure shows how many of the training data are associated with each of the neurons (cluster centers). The topology is a 10-by-10 grid, so there are 100 neurons. The maximum number of hits associated with any neuron is
22. Thus, there are 22 input vectors in that cluster.
9 You can also visualize the SOM by displaying weight places (also referred to
as
component planes). Click SOM Weight Planes in the Neural Network
Clustering Tool.
1-53
1 Getting Started
This figure shows a weight plane for each element of the input vector (two, in this case). They are visualizations of the weights that connect each input to each of the neurons. (Darker colors represent larger weights.) If the connection patterns of two inputs were very similar, you can assume that the inputs are highly correlated. In this case, input 1 has connections that are very different than those of input 2.
1-54
10 In the Neural Network Clustering Tool, click Next to evaluate the network.
Clustering Data
At this point you can test the network against new data.
If you are dissatisfied with the network’s performance on the original or new data, you can increase the number of neurons, or perhaps get a larger training data set.
11 When you are satisfied with the network performance, click Next.
1-55
1 Getting Started
12 Use the buttons on this screen to save your results.
1-56
You now have the network saved as additional tests on it, or put it to work on new inputs, using the function
If you click recreate the steps that you have just performed from the command line. Generating an M-file is a good way to learn how to use the command-line operations of the Neural Network Toolbox™ software.
13 When you have saved your results, click Finish.
Generate M-File, the tool creates an M-file, with commands that
net1 in the workspace. You can perform
sim.

Neuron Model and Network Architectures

Neuron Model (p. 2-2)
Network Architectures (p. 2-8)
Data Structures (p. 2-14)
Training Styles (p. 2-20)
2
2 Neuron Model and Network Architectures
Input - Title -
- Exp -
an
p
w
f
Neuron without bias
a = f (wp )
Input - Title -
- Exp -
an
p
f
Neuron with bias
a = f (wp + b)
b
1
w

Neuron Model

Simple Neuron

A neuron with a single scalar input and no bias appears on the left below.
The scalar input p is transmitted through a connection that multiplies its strength by the scalar weight w to form the product wp, again a scalar. Here the weighted input wp is the only argument of the transfer function f, which produces the scalar output can view the bias as simply being added to the product wp as shown by the summing junction or as shifting the function f to the left by an amount b. The bias is much like a weight, except that it has a constant input of 1.
a. The neuron on the right has a scalar bias, b. You
2-2
The transfer function net input n, again a scalar, is the sum of the weighted input wp and the bias b. This sum is the argument of the transfer function f. (Chapter 8, “Radial Basis Networks,” discusses a different way to form the net input n.) Here f is a transfer function, typically a step function or a sigmoid function, that takes the argument n and produces the output a. Examples of various transfer functions are in “Transfer Functions” on page 2-3. Note that w and b are both adjustable scalar parameters of the neuron. The central idea of neural n e t works is th a t such param e ters can b e adjuste d s o that the n etwork exhibits some desired or interesting behavior. Thus, you can train the network to do a particular job by adjusting the weight or bias parameters, or perhaps the network itself will adjust these parameters to achieve some desired end.
All the neurons in the Neural Network Toolbox™ software have provision for a bias, and a bias is used in many of the examples and is assumed in most of this toolbox. However, you can omit a bias in a neuron if you want.
Neuron Model
a = hardlim(n)
Hard-Limit Transfer Function
-1
n
0
+1
a
As previously noted, the bias b is an adjustable (scalar) parameter of the neuron. It is
not an input. However, the constant 1 that drives the bias is an
input and must be treated as such when you consider the linear dependence of input vectors in Chapter 4, “Linear Filters.”

Transfer Functions

Many transfer functions are included the Neural Network Toolbox software. Three of the most commonly used functions are shown below.
The hard-limit transfer function shown above limits the output of the neuron to either 0, if the net input argument n is less than 0, or 1, if n is greater than or equal to 0. This function is used in Chapter 3, “Perceptrons,” to create neurons that make classification decisions.
The toolbox has a function,
hardlim, to realize the mathematical hard-limit
transfer function shown above. Try the following code:
n = -5:0.1:5; plot(n,hardlim(n),'c+:');
It produces a plot of the function hardlim over the range -5 to +5.
All the mathematical transfer functions in the toolbox can be realized with a function having the same name.
The following figure illustrates the linear transfer function.
2-3
2 Neuron Model and Network Architectures
n
0
-1
+1
a = purelin(n)
Linear Transfer Function
a
-1
n
0
+1
a
Log-Sigmoid Transfer Function
a = logsig(n)
Neurons of this type are used as linear approximators in Chapter 4, “Linear Filters.”
The sigmoid transfer function shown below takes the input, which can have any value between plus and minus infinity, and squashes the output into the range 0 to 1.
2-4
This transfer function is commonly used in backpropagation networks, in part because it is differentiable.
The symbol in the square to the right of each transfer function graph shown above represents the associated transfer function. These icons replace the general
f in the boxes of network diagrams to show the particular transfer
function being used.
For a complete listing of transfer functions and their icons, You can also specify your own transfer functions.
You can experiment with a simple neuron and various transfer functions by running the demonstration program
nnd2n1.
Neuron Model
p1, p2,... p
R
w
11,
, w
12,
, ... w
1 R,
Input
p
1
an
p
2
p
3
p
R
w
1, R
w
1,1
f
b
1
Where R = number of
elements in input vector
Neuron w Vector Input
a = f(Wp +b)
nw
11,p1w12,p2
... w
1 R,pR
b+++ +=

Neuron with Vector Input

A neuron with a single R-element input vector is shown below. Here the individual element inputs
are multiplied by weights
and the weighted values are fed to the summing junction. Their sum is simply
Wp, the dot product of the (single row) matrix W and the vector p.
The neuron has a bias b, which is summed with the weighted inputs to form the net input n. This sum, n, is the argument of the transfer function f.
This expression can, of course, be written in MATLAB
n = W*p + b
®
code as
However, you will seldom be writing code at this level, for such code is already built into functions to define and simulate entire networks.
Abbreviated Notation
The figure of a single neuron shown above contains a lot of detail. When you consider networks with many neurons, and perhaps layers of many neurons, there is so much detail that the main thoughts tend to be lost. Thus, the
2-5
2 Neuron Model and Network Architectures
p a
1
n
W
b
R x 1
1
x R
1
x 1
1
x 1
1
x 1
Input
R
1
f
Where... R = number of
elements in input vector
Neuron
a = f(Wp +b)
authors have devised an abbreviated notation for an individual neuron. This notation, which is used later in circuits of multiple neurons, is shown.
Here the input vector The dimensions of
p is represented by the solid dark vertical bar at the left.
p are shown below the symbol p in the figure as Rx1. (Note
that a capital letter, such as R in the previous sentence, is used when referring to the size of a vector.) Thus, postmultiply the single-row, R-column matrix
p is a vector of R input elements. These inputs
W. As before, a constant 1 enters
the neuron as an input and is multiplied by a scalar bias b. The net input to the transfer function f is n, the sum of the bias b and the product
Wp. This sum is
passed to the transfer function f to get the neuron’s output a, which in this case is a scalar. Note that if there were more than one neuron, the network output would be a vector.
A layer of a network is defined in the previous figure. A layer includes the combination of the weights, the multiplication and summing operation (here realized as a vector product array of inputs, vector
Wp), the bias b, and the transfer function f. The
p, is not included in or called a layer.
Each time this abbreviated network notation is used, the sizes of the matrices are shown just below their matrix variable names. This notation will allow you to understand the architectures and follow the matrix mathematics associated with them.
As discussed in “Transfer Functions” on page 2-3, when a specific transfer function is to be used in a figure, the symbol for that transfer function replaces the f shown above. Here are some examples.
2-6
Neuron Model
purelinhardlim logsig
You can experiment with a two-element neuron by running the demonstration program
nnd2n2.
2-7
2 Neuron Model and Network Architectures
f
f
f
w
1,1
w
SR,
S
S
S
n
1
p
1
p
2
p
3
p
R
n
2
n
S
b
1
b
2
b
S
a
1
a
2
a
S
1
1
1
Inputs
Layer of Neurons
a=f(Wp+b)
Where
= number of
elements in input vector
= number of
neurons in layer
R
S

Network Architectures

Two or more of the neurons shown earlier can be combined in a layer, and a particular network could contain one or more such layers. First consider a single layer of neurons.

A Layer of Neurons

A one-layer network with R input elements and S neurons follows.
2-8
In this network, each element of the input vector input through the weight matrix
W. The ith neuron has a summer that gathers
p is connected to each neuron
its weighted inputs and bias to form its own scalar output n(i). The various n(i) taken together form an S-element net input vector
the figure.
Note that it is common for the number of inputs to a layer to be different from the number of neurons (i.e., R is not necessarily equal to S). A layer is not constrained to have the number of its inputs equal to the number of its neurons.
outputs form a column vector
a. The expression for a is shown at the bottom of
n. Finally, the neuron layer
Network Architectures
W
w
11,w12,
w
1 R,
w
21,w22,
w
2 R,
w
S 1,wS 2,
w
SR,
=
a= f (Wp + b)
pa
1
n
W
b
R x 1
S
x R
S
x 1
S
x 1
Input
Layer of Neurons
RS
f
S x 1
R = number of
elements in input vector
Where...
S = number of neurons in layer 1
You can create a single (composite) layer of neurons having different transfer functions simply by putting two of the networks shown earlier in parallel. Both networks would have the same inputs, and each network would create some of the outputs.
The input vector elements enter the network through the weight matrix
Note that the row indices on the elements of matrix
W indicate the destination
W.
neuron of the weight, and the column indices indicate which source is the input for that weight. Thus, the indices in w
from the second input element
to the first (and only) neuron is w
say that the strength of the signal
1,2
1,2
.
The S neuron R input one-layer network also can be drawn in abbreviated notation.
Here
p is an R length input vector, W is an SxR matrix, and a and b are S
length vectors. As defined previously, the neuron layer includes the weight matrix, the multiplication operations, the bias vector transfer function boxes.
Inputs and Layers
To describe networks having multiple layers, the notation must be extended. Specifically, it needs to make a distinction between weight matrices that are
b, the summer, and the
2-9
2 Neuron Model and Network Architectures
pa
1
1
n
1
S
1
x R
S
1
x 1
S
1
x 1
S
1
x 1
Input
IW
1,1
b
1
Layer 1
S
1
f
1
R
a
1
= f1(IW
1,1
p +b1)
S
1
x 1
R
x 1
R = number of elements in input vector
S = number of neurons in Layer 1
Where...
connected to inputs and weight matrices that are connected between layers. It also needs to identify the source and destination for the weight matrices.
We will call weight matrices connected to inputs input weights; we will call weight matrices coming from layer outputs layer weights. Further, superscripts are used to identify the source (second index) and the destination (first index) for the various weights and other elements of the network. To illustrate, the one-layer multiple input network shown earlier is redrawn in abbreviated form below.
As you can see, the weight matrix connected to the input vector an input weight matrix ( destination 1 (first index). Elements of layer 1, such as its bias, net input, and output have a superscript 1 to say that they are associated with the first layer.
“Multiple Layers of Neurons” uses layer weight ( weight (
IW) matrices.

Multiple Layers of Neurons

A network can have several layers. Each layer has a weight matrix W, a bias vector output vectors, etc., for each of these layers in the figures, the number of the layer is appended as a superscript to the variable of interest. You can see the use of this layer notation in the three-layer network shown below, and in the equations at the bottom of the figure.
2-10
b, and an output vector a. To distinguish between the weight matrices,
1,1
IW
) having a source 1 (second index) and a
LW) matrices as well as input
p is labeled as
Network Architectures
iw
1,1
1,1
lw
2,1
1,1
lw
,
1,1
32
iw
1,1
1
SR,
lw
2,1
21
SS ,
lw
3,2
SS
32
,
S S S
S S S
S S S
n
1
1
n
1
2
n
1
3
p
1
p
2
p
3
p
R
n
2
1
n
2
2
n
2
3
n
1
S
1 n
2
S
2 n
3
S
3
b
1
1
b
1
2
b
1
3
b
2
1
b
2
2
b
2
3
b
1
S
1 b
2
S
2 b
3
S
3
a
1
1
a
1
2
a
1
3
a
2
1
a
2
2
a
2
3
a
1
S
1 a
2
S
2 a
3
S
3
111
111
111
Inputs Layer 1 Layer 2 Layer 3
f
2
f
2
f
2
f
3
f
3
f
3
f
1
f
1
f
1
aIWpb
1 1,1 1
=( +)f
1
aLWab
2 2,1 1 2
=( +)f
2
aLWab
3 3,2 2 3
=( +)f
3
a LWLWIWpbbb
3 3,2 2,1 1,1 1 2 3
= ( ( ( +)+)+)fff
321
The network shown above has R1 inputs, S1 neurons in the first layer, S neurons in the second layer, etc. It is common for different layers to have different numbers of neurons. A constant input 1 is fed to the bias for each neuron.
Note that the outputs of each intermediate layer are the inputs to the following layer. Thus layer 2 can be analyzed as a one-layer network with S neurons, and an S
2
is
a
. Now that all the vectors and matrices of layer 2 have been identified, it can be treated as a single-layer network on its own. This approach can be taken with any layer of the network.
The layers of a multilayer network play different roles. A layer that produces the network output is called an output layer. All other layers are called hidden layers. The three-layer network shown earlier has one output layer (layer 3) and two hidden layers (layer 1 and layer 2). Some authors refer to the inputs as a fourth layer. This toolbox does not use that designation.
2xS1
weight matrix W2. The input to layer 2 is a1; the output
2
1
inputs, S2
2-11
2 Neuron Model and Network Architectures
pa1a
2
1 1
n
1
n
2
a
3 =
y
n
3
1
S
2 x S
1
S
2 x
1
S
2 x
1
S 2 x
1
S
3
x S
2
S
3 x
1
S
3
x 1
S
3
x 1
R x
1
S
1 x R
S
1 x
1
S 1 x
1
S
1 x
1
Input
IW
1,1
b
1
b
2
b
3
LW
2,1
LW
3,2
R S
3
S
1
S
2
f
2
f
3
Layer 1 Layer 2 Layer 3
a
1
=
f
1
(IW
1,1
p +b1)
a
2
=
f
2
(LW
2,1 a1
+b2) a3 =
f
3
(LW
3,2a2
+b3)
a
3
=
f
3
(LW
3,2
f
2
(LW
2,1
f
1
(IW
1,1p +b1
)+ b2)
+
b
3 =
y
f
1
The same three-layer network can also be drawn using abbreviated notation.
Multiple-layer networks are quite powerful. For instance, a network of two layers, where the first layer is sigmoid and the second layer is linear, can be trained to approximate any function (with a finite number of discontinuities) arbitrarily well. This kind of two-layer network is used extensively in Chapter 5, “Backpropagation.”
Here it is assumed that the output of the third layer, of interest, and this output is labeled as output of multilayer networks.
3
a
, is the network output
y. This notation is used to specify the

Input and Output Processing Functions

Network inputs might have associated processing functions. Processing functions transform user input data to a form that is easier or more efficient for a network.
For instance, interval [-1, 1]. This can speed up learning for many networks.
removeconstantrows removes the values for input elements that always have
2-12
the same value because these input elements are not providing any useful information to the network. The third common processing function is
fixunknowns, which recodes unknown data (represented in the user’s data
with
NaN values) into a numerical form for the network. fixunknowns preserves
information about which values are known and which are unknown.
mapminmax transforms input data so that all values fall into the
Network Architectures
Similarly, network outputs can also have associated processing functions. Output processing functions are used to transform user-provided target vectors for network use. Then, network outputs are reverse-processed using the same functions to produce output data with the same characteristics as the original user-provided targets.
Both
mapminmax and removeconstantrows are often associated with network
outputs. However, (represented by
fixunknowns is not. Unknown values in targets
NaN values) do not need to altered for network use.
Processing functions are described in more detail in “Preprocessing and Postprocessing” in Chapter 5.
2-13
2 Neuron Model and Network Architectures
p
1
an
Inputs
b
p
2
w
1,2
w
1,1
1
a = purelin (Wp + b)
Linear Neuron
p
1
1 2
= ,
p
2
2 1
= ,
p
3
2 3
= ,
p
4
3 1
=

Data Structures

This section discusses how the format of input data structures affects the simulation of networks. It starts with static networks, and then continues with dynamic networks.
There are two basic types of input vectors: those that occur concurrently (at the same time, or in no particular time sequence), and those that occur sequentially in time. For concurrent vectors, the order is not important, and if there were a number of networks running in parallel, you could present one input vector to each of the networks. For sequential vectors, the order in which the vectors appear is important.

Simulation with Concurrent Inputs in a Static Network

The simplest situation for simulating a network occurs when the network to be simulated is static (has no feedback or delays). In this case, you need not be concerned about whether or not the input vectors occur in a particular time sequence, so you can treat the inputs as concurrent. In addition, the problem is made even simpler by assuming that the network has only one input vector. Use the following network as an example.
2-14
Suppose that the network simulation data set consists of Q = 4 concurrent vectors:
Data Structures
W
12
=
b
0
=
Concurrent vectors are presented to the network as a single matrix:
P = [1 2 2 3; 2 1 3 1];
Suppose that this network typically outputs -100, 50 and 100. This is arbitrary for this example. If you were solving a real problem, you would have actual values.
T = [-100 50 50 100];
To set up this feedforward network, use the following command:
net = newlin(P,T);
For simplicity assign the weight matrix and bias to be
and
The commands for these assignments are
net.IW{1,1} = [1 2]; net.b{1} = 0;
You can now simulate the network:
A = sim(net,P) A = 5 4 8 5
A single matrix of concurrent vectors is presented to the network, and the network produces a single matrix of concurrent vectors as output. The result would be the same if there were four networks operating in parallel and each network received one of the input vectors and produced one of the outputs. The ordering of the input vectors is not important, because they do not interact with each other.

Simulation with Sequential Inputs in a Dynamic Network

When a network contains delays, the input to the network would normally be a sequence of input vectors that occur in a certain time order. To illustrate this case, the next figure shows a simple network that contains one delay.
2-15
2 Neuron Model and Network Architectures
a(t)n(t)
Inputs
w
1,1
D
w
1,2
Linear Neuron
p(t)
a(t) = w
p(t) + w
p(t - 1)
p11= , p22= , p33= , p44=
W
12
=
Suppose that the input sequence is
Sequential inputs are presented to the network as elements of a cell array:
P = {1 2 3 4};
2-16
Suppose you know that the typical output values include 10, 3 and -7. These values are arbitrary for this example; if you were solving a real problem, you would have real output values.
T = {10, 3, 7};
The following commands create this network:
net = newlin(P,T,[0 1]); net.biasConnect = 0;
Assign the weight matrix to be
The command is
net.IW{1,1} = [1 2];
You can now simulate the network:
A = sim(net,P) A =
Data Structures
p
1
1
= ,
p
2
2
= ,
p
3
3
= ,
p
4
4
=
[1] [4] [7] [10]
You input a cell array containing a sequence of inputs, and the network produces a cell array containing a sequence of outputs. The order of the inputs is important when they are presented as a sequence. In this case, the current output is obtained by multiplying the current input by 1 and the preceding input by 2 and summing the result. If you were to change the order of the inputs, the numbers obtained in the output would change.

Simulation with Concurrent Inputs in a Dynamic Network

If you were to apply the same inputs as a set of concurrent inputs instead of a sequence of inputs, you would obtain a completely different response. (However, it is not clear why you would want to do this with a dynamic network.) It would be as if each input were applied concurrently to a separate parallel network. For the previous example, “Simulation with Sequential Inputs in a Dynamic Network” on page 2-15, if you use a concurrent set of inputs you have
which can be created with the following code:
P = [1 2 3 4];
When you simulate with concurrent inputs, you obtain
A = sim(net,P) A = 1 2 3 4
The result is the same as if you had concurrently applied each one of the inputs to a separate network and computed one output. Note that because you did not assign any initial conditions to the network delays, they were assumed to be 0. For this case the output is simply 1 times the input, because the weight that multiplies the current input is 1.
In certain special cases, you might want to simulate the network response to several different sequences at the same time. In this case, you would want to present the network with a concurrent set of sequences. For example, suppose you wanted to present the following two sequences to the network:
2-17
2 Neuron Model and Network Architectures
p
1
1()
1
,= p
1
2()
2
,= p
1
3()
3
,= p
1
4()4=
p
2
1()
4
,= p
2
2()
3
,= p
2
3()
2
,= p
2
4()1=
p11()p21()…pQ1(),,,[]p12()p22()…pQ2(),,,[]
·
p1TS()p2TS()pQTS(),,,[],,,{ }
First Sequence
Qth Sequence
The input P should be a cell array, where each element of the array contains the two elements of the two sequences that occur at the same time:
P = {[1 4] [2 3] [3 2] [4 1]};
You can now simulate the network:
A = sim(net,P);
The resulting network output would be
A = {[1 4] [4 11] [7 8] [10 5]}
As you can see, the first column of each matrix makes up the output sequence produced by the first input sequence, which was the one used in an earlier example. The second column of each matrix makes up the output sequence produced by the second input sequence. There is no interaction between the two concurrent sequences. It is as if they were each applied to separate networks running in parallel.
The following diagram shows the general format for the input
P to the sim
function when there are Q concurrent sequences of TS time steps. It covers all cases where there is a single input vector. Each element of the cell array is a matrix of concurrent vectors that correspond to the same point in time for each sequence. If there are multiple input vectors, there will be multiple rows of matrices in the cell array.
In this section, you apply sequential and concurrent inputs to dynamic networks. In “Simulation with Concurrent Inputs in a Static Network” on page 2-14, you applied concurrent inputs to static networks. It is also possible
2-18
Data Structures
to apply sequential inputs to static networks. It does not change the simulated response of the network, but it can affect the way in which the network is trained. This will become clear in “Training Styles” on page 2-20.
2-19
2 Neuron Model and Network Architectures
t 2p1p2+=
p
1
1 2
= ,
p
2
2 1
= ,
p
3
2 3
= ,
p
4
3 1
=
t
1
4
= ,
t
2
5
= ,
t
3
7
= ,
t
4
7
=

Training Styles

This section describes two different styles of training. In incremental training the weights and biases of the network are updated each time an input is presented to the network. In batch training the weights and biases are only updated after all the inputs are presented.

Incremental Training (of Adaptive and Other Networks)

Incremental training can be applied to both static and dynamic networks, although it is more commonly used with dynamic networks, such as adaptive filters. This section demonstrates how incremental training is performed on both static and dynamic networks.
Incremental Training with Static Networks
Consider again the static network used for the first example. You want to train it incrementally, so that the weights and biases are updated after each input is presented. In this case you use the function are presented as sequences.
adapt, and the inputs and targets
2-20
Suppose you want to train the network to create the linear function:
Then for the previous inputs,
the targets would be
For incremental training, you present the inputs and targets as sequences:
P = {[1;2] [2;1] [2;3] [3;1]}; T = {4 5 7 7};
First, set up the network with zero initial weights and biases. Also, set the initial learning rate to zero to show the effect of incremental training.
Training Styles
net = newlin(P,T,0,0); net.IW{1,1} = [0 0]; net.b{1} = 0;
Recall from “Simulation with Concurrent Inputs in a Static Network” on page 2-14 that, for a static network, the simulation of the network produces the same outputs whether the inputs are presented as a matrix of concurrent vectors or as a cell array of sequential vectors. However, this is not true when training the network. When you use the
adapt function, if the inputs are
presented as a cell array of sequential vectors, then the weights are updated as each input is presented (incremental mode). As shown in the next section, if the inputs are presented as a matrix of concurrent vectors, then the weights are updated only after all inputs are presented (batch mode).
You are now ready to train the network incrementally.
[net,a,e,pf] = adapt(net,P,T);
The network outputs remain zero, because the learning rate is zero, and the weights are not updated. The errors are equal to the targets:
a = [0] [0] [0] [0] e = [4] [5] [7] [7]
If you now set the learning rate to 0.1 you can see how the network is adjusted as each input is presented:
net.inputWeights{1,1}.learnParam.lr=0.1; net.biases{1,1}.learnParam.lr=0.1; [net,a,e,pf] = adapt(net,P,T); a = [0] [2] [6] [5.8] e = [4] [3] [1] [1.2]
The first output is the same as it was with zero learning rate, because no update is made until the first input is presented. The second output is different, because the weights have been updated. The weights continue to be modified as each error is computed. If the network is capable and the learning rate is set correctly, the error is eventually driven to zero.
Incremental Training with Dynamic Networks
You can also train dynamic networks incrementally. In fact, this would be the most common situation.
2-21
2 Neuron Model and Network Architectures
Here are the initial input Pi and the inputs P and targets T as elements of cell arrays.
Pi = {1}; P = {2 3 4}; T = {3 5 7};
Create a linear network with one delay at the input, as used in a previous example. Initialize the weights to zero and set the learning rate to 0.1.
net = newlin(P,T,[0 1],0.1); net.IW{1,1} = [0 0]; net.biasConnect = 0;
You want to train the network to create the current output by summing the current and the previous inputs. This is the same input sequence you used in the previous example (using term in the sequence as the initial condition for the delay. You can now sequentially train the network using
[net,a,e,pf] = adapt(net,P,T,Pi); a = [0] [2.4] [7.98] e = [3] [2.6] [-0.98]
sim) with the exception that you assign the first
adapt.
2-22
The first output is zero, because the weights have not yet been updated. The weights change at each subsequent time step.

Batch Training

Batch training, in which weights and biases are only updated after all the inputs and targets are presented, can be applied to both static and dynamic networks. Both types of networks are discussed in this section.
Batch Training with Static Networks
Batch training can be done using either adapt or train, although train is generally the best option, because it typically has access to more efficient training algorithms. Incremental training can only be done with can only perform batch training.
For batch training of a static network with
adapt, the input vectors must be
placed in one matrix of concurrent vectors.
P = [1 2 2 3; 2 1 3 1];
adapt; train
Training Styles
T = [4 5 7 7];
Begin with the static network used in previous examples. The learning rate is set to 0.1.
net = newlin(P,T,0,0.1); net.IW{1,1} = [0 0]; net.b{1} = 0;
When you call adapt, it invokes trains (the default adaptation function for the linear network) and biases).
trains uses Widrow-Hoff learning.
[net,a,e,pf] = adapt(net,P,T); a = 0 0 0 0 e = 4 5 7 7
learnwh (the default learning function for the weights and
Note that the outputs of the network are all zero, because the weights are not updated until all the training set has been presented. If you display the weights, you find
»net.IW{1,1}
ans = 4.9000 4.1000
»net.b{1}
ans =
2.3000
This is different from the result after one pass of adapt with incremental updating.
Now perform the same batch training using rule can be used in incremental or batch mode, it can be invoked by
train. (There are several algorithms that can only be used in batch mode (e.g.,
Levenberg-Marquardt), so these algorithms can only be invoked by
train. Because the Widrow-Hoff
adapt or
train.)
For this case, the input vectors can be in a matrix of concurrent vectors or in a cell array of sequential vectors. Because the network is static and because
train always operates in batch mode, train converts any cell array of
sequential vectors to a matrix of concurrent vectors. Concurrent mode operation is used whenever possible because it has a more efficient implementation in MATLAB
P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];
®
code:
2-23
2 Neuron Model and Network Architectures
The network is set up in the same way.
net = newlin(P,T,0,0.1); net.IW{1,1} = [0 0]; net.b{1} = 0;
Now you are ready to train the network. Train it for only one epoch, because you used only one pass of network is is
learnwh, so you should get the same results obtained using adapt in the
previous example, where the default adaptation function was
net.inputWeights{1,1}.learnParam.lr = 0.1; net.biases{1}.learnParam.lr = 0.1; net.trainParam.epochs = 1; net = train(net,P,T);
If you display the weights after one epoch of training, you find
»net.IW{1,1}
»net.b{1}
2.3000
trainb, and the default learning function for the weights and biases
ans = 4.9000 4.1000
ans =
adapt. The default training function for the linear
trains.
2-24
This is the same result as the batch mode training in adapt. With static networks, the
adapt function can implement incremental or batch training,
depending on the format of the input data. If the data is presented as a matrix of concurrent vectors, batch training occurs. If the data is presented as a sequence, incremental training occurs. This is not true for
train, which always
performs batch training, regardless of the format of the input.
Batch Training with Dynamic Networks
Training static networks is relatively straightforward. If you use train the network is trained in batch mode and the inputs are converted to concurrent vectors (columns of a matrix), even if they are originally passed as a sequence (elements of a cell array). If you use the method of training. If the inputs are passed as a sequence, then the network is trained in incremental mode. If the inputs are passed as concurrent vectors, then batch mode training is used.
With dynamic networks, batch mode training is typically done with especially if only one training sequence exists. To illustrate this, consider again
adapt, the format of the input determines
train only,
Loading...