README for Backprop 0.9.5 ------------------------- Thank you for downloading Backprop! I hope that you find it to be a useful program. This README file describes what Backprop is, and how to use it. It also includes a summary of changes since the last version, information on what I plan to do in the future, and how to contact me to request features or report bugs. Updates ------- The latest update of Backprop will always be available at the following URL: http://www.users.cts.com/crash/s/slogan/backprop.html Distribution Rights ------------------- Backprop is Copyright 2003, 2004 Syd Logan. You have permission to redistribute at will, for any legal and ethical purpose, provided that you include this README, unmodified, along with this software. Bug Reports and Suggestions --------------------------- Please send bug reports to me, Syd Logan, at slogan@cts.com. Your input will help to make Backprop a better, more stable program. When reporting a bug, please include the following information: -- Version of Backprop. The version is shown in the dialog that is displayed when you select "About Backprop..." in the "Help" menu. -- Operating system (e.g., Windows 95, 98, NT, 2000, XP). -- Steps to reproduce the problem. Attach a copy of the data files you are using (the the architecture file and training exemplars for problems encountered when training, or the weight file and execution data for runtime problems) so that I can duplicate the problem. Thanks. If I can't understand the problem, I probably can't help, so the more information, the better. What Is Backprop? ----------------- Backprop is a multi-layer neural network simulator that is based upon the popular backpropagation learning algorithm. The goal of this simulator is to provide users with a friendly and easy to use environment for experimenting with backpropagation networks. To achieve this, I put a lot of effort into making the user interface give as much visual feedback as possible, especially during network training, as well as giving the user easy to use interfaces for changing the attributes of the network, such as learning rates, momentum, and so forth. You can zoom in on the network graphically to see weight values in more detail, or zoom out in order to make visible larger, more complicated network architectures. You can speed up, or slow down, the rate at which error graphics and network state are updated during training. It is features like this that I hope will make Backprop your first choice for experimenting with backpropagation neural networks. Starting with version 0.9.5, you can now create and modify neural networks, and save the results to disk as XML. More details on the use of backpropagation are provided later in this document (see "How to Use Backprop", below). Backprop is written entirely in C++, and uses the Microsoft Foundation Classes (MFC) for its user interface. It should run without any problems on any Windows platform, starting with Windows 95. How to Use Backprop ------------------- If you are new to neural networks, or to backpropagation in particular, you should spend some time reading about it before using backprop. There are numerous books, journals, and web sites that contain information about backpropagation neural networks, and their uses. The following should be enough to get you started, however. A neural network is a program that can be trained to perform a task, usually pattern recognition, classification, or function approximation. For example, you might train a neural network to classify an input as belonging to a certain class, or to recognize a series of pen strokes read on an input device as a letter of the alphabet. In order to train the neural network to perform its intended task, you must do the following: -- Come up with a neural network architecture. A neural network consists of a set of layers, each containing a number of nodes. The number of layers in backpropagation nets is usually 3 or larger. The first layer is called the input layer, and it has one node for each input. The last layer is called the output layer, and it has one node for each output. The remaining layers are called hidden layers, and the number of nodes in these layers is harder to specify. As an example, consider a neural network that is designed to classify patterns based on the following input data: Has Fins Has Gills Is a Fish ----------------------------------- Yes No No No No No Yes Yes Yes The first row of the table represents the fact that an animal that has fins, but not gills, is not a fish. In converting this data to use with a neural network, we can simply replace Yes with the value 1, no with the value 0 (or perhaps -1), and come up with the following: Has Fins Has Gills Is a Fish ----------------------------------- 1 0 0 0 0 0 1 1 1 Some of you may know that solving this particular problem with a backpropagation network is overkill, as it can be solved with simpler paradigms, such as the perceptron. However, it is an easy to understand problem, and for those of you who are new to neural nets, simple is better at this point. Backpropagation is usually used to solve much harder problems, so don't let the simple nature of this example lead you to think backpropagation is only useful for solving toy problems. That is most certainly not the case. A neural network with two input nodes, one corresponding to Has Fins and one corresponding to Has Gills, and one output node that corresponds to Is a Fish, can be used to solve the above problem. Setting the input layer node one to 1 and node two to 0, in a properly trained network, will result in the output node firing 0. The output node should also, in a properly trained network, fire 0 if nodes one and two in the input layer are set to the value 0. The number of hidden layers, and the number of nodes in each of the hidden layers, is more difficult to specify. Many claim that coming up with the hidden layer architecture is more of an "art" than a "science". I won't argue that. One of the nice things about backprop is you can easily add or remove hidden layers, or change the number of nodes in the hidden layers, and see the effects it has on training. -- Once you have an architecture for the neural network in hand, you need to train the neural network. This is done by presenting the neural network with examples that it can use to learn the problem you want it to solve. These examples, also known as exemplars, are repeatedly shown to the network until the network learns them, or some maximum number of tries has been performed. It can, and often does, take tens of thousands of presentations of a set of exemplars before a network becomes trained. How long it takes is a function of the network architecture, the initial state of the network, nuances of the training algorithm, and the set of exemplars. Changing one or more of these is all it sometimes takes for a network that won't train to turn into a network that will. One of the design goals of backprop is to give you the tools you need to visualize how changes in the exemplar set, training algorithm, or architecture affect the ability of the neural network to train successfully. Once the network is trained, you can then use it to solve problems. This is done by presenting data to the input layer nodes, and observing the values that result in the output layer. Launching Backprop ------------------ To launch backprop, simply double click on the backprop icon. Loading a Network Architecture File ----------------------------------- The first thing that you must do after launching backprop is to load an architecture file that describes the architecture of the network. The architecture file is a file that you create in a text editor (like notepad). The architecture file is written using XML. Here is a simple example of an architecture file that describes a network suitable for solving the "has fins, has gills, is fish" problem above. The architecture file consists of two tags, the tag, and the tag. The tag defines the overall network architecture, which consists of layers. In this case, the network has three layers. The first layer has a size attribute of 2, the second layer has a size attribute of 3, and the third layer has a size attribute of 1. The size attribute defines how many neurons are in the layer, therefore, this network has 2 neurons or nodes in the first layer, 3 in the second layer, and 1 in the third layer. Also, the first layer is always the input layer, the last layer is always the output layer, and the layers between are hidden layers. Thus, we have a network that contains 3 layers, accepts 2 inputs, fires a single output, and has a hidden layer that contains 3 nodes. The first node in the input layer will accept as input the "has fins" attribute, the second node in the input layer will accept the "has gills" attribute, and the output of the single neuron in the output layer will fire a value which represents the "is fish" attribute. The goal of the network training, described below, will be to train the network so that it fires the correct output response ("is fish") when presented different values for "has fins" and "has gills" at the input layer neurons. More details on the XML format for use in Backprop to describe network architectures is provided later in this document (see "Network Architecture Language", below). By convention, architecture files are stored on disk in files with a ".net" suffix, for example, "mynet.net" is a backprop architecture file. To load an architecture file, select Open... from the File menu. All the files in the current directory with a suffix of ".net" will be displayed. Click on the file and hit OK. Backprop will load the architecture file and display a graphical representation of the network. Note that lines connect each node in the input layer to the nodes in the first hidden layer, each node in the first hidden layer to the second hidden layer, and so forth. Modifying the Neural Network ---------------------------- Starting with version 0.9.5, you can modify the topology of a neural network, and save the results, in XML, to disk. To do this, position your mouse over a neuron, and click the right mouse button. A popup menu will display, with 4 menu items: Insert Layer, Insert Neuron, Delete Layer, and Delete Neuron. These menu items are described in the following sections. Inserting Layers ---------------- The Insert Layer menu is a pullright menu, clicking it will cause a new menu to display. The menu contains two menu items: After and Before. Selecting After will cause a layer with 1 neuron to be inserted after the layer containing the neuron your mouse was positioned over when you clicked the right mouse button. Alternately, selecting before will cause the layer to be inserted before the layer containing the neuron you clicked over. Inserting Neurons ----------------- To insert a neuron in a layer, position the mouse over any neuron in the layer you want to insert a neuron to, click the right mouse utton, and select the Insert Neuron menu item. Deleting Layers --------------- To delete a layer, position the mouse over any neuron in the layer you want to delete, click the right mouse button, and select the Delete Layer menu item. Deleting Neurons ---------------- To delete a neuron, position the mouse over any neuron in the layer from which you want to delete a neuron, click the right mouse button, and select the Delete Neuron menu item. Creating a New Neural Network ----------------------------- Starting with 0.9.5, you can also create new neural networks graphically, and save the results to an XML file of your choosing. Simply select "New" from the File menu. A 2 layer neural network, containing a single neuron in each layer, is created. You can add layers and neurons as described above in "Inserting Layers" and "Inserting Neurons". Saving Your Changes ------------------- Save and Save As... menu items in the File menu were added in 0.9.5. Use these to save your network architecture and training settings to an XML file. These menu items will prompt you for the name and location of the file. Training the Network -------------------- The next step is to train the neural network to solve a problem. This is done by selecting "Train..." from the Network menu. A dialog will display, asking you to specify an exemplar file. Type in the path of the exemplar file, or click the "Browse" button to navigate the file system in search of one. By convention, exemplar files are given the same name as the architecture file, but have a ".exm" suffix. For example, "mynet.exm" would be the exemplar file for the network defined in the architecture file named "mynet.net". The exemplar file (as of 0.9.5) is specified in XML. An exemplar file corresponding to the "Has Fins, Has Gills, Is a Fish" problem described above might look like this: This file contains 3 exemplars. The tag is required, and wraps the exemplars that are specified by the file. An tag defines each exemplar, and it, in turn, wraps and tags that define the inputs and expected outputs, respectively, of the exemplar. The exemplars in the above example implement the training data for the Has Fins, Has Gills, Is a Fish problem specified above, where the first tag in each exemplar corresponds to the Has Fins input, the second tag corresponds to Has Gills, and the Is a Fish result is specified by the value attribute of the tag. You also must specify a weight file, which by convention is the name of the architecture file with a ".wgt" suffix, for example, "mynet.wgt". The weights file represents the state of a learned neural network. As I mentioned above, each node in the input layer is connected to each node in the first hidden layer, and so forth. Each of these connections has a corresponding weight associated. Training a neural network amounts to changing these weights in such a way that the network correctly learns the problem it is being trained for. If the neural network successfully trains, the weights will be saved to this file when training completes. (The weights are saved as XML as of 0.9.5). Once you have specified the exemplar and weight files, click OK to start training. A dialog will display that shows the number of iterations executed, and a graph of the cummulative error. A properly training network will, over time, exhibit a decrease in the cummulative error, but at times the error may even increase as the network searches for a solution. The architecture window will display the firing values of each node in the network during its training, using a 255-level grayscale colormap. The range of values mapped to this grayscale colormap is [0.0, 1.0] by default, with 0.0 displaying as black, 1.0 displaying as white, and 0.5 displaying as a middle gray. Values above the range display as green, and values below the range display as red. You can change the range by selecting Options... from the View menu, and changing the values in the Neuron Output Range text fields. For example, to set the lower value to -1, type -1 in the "Low" text field, and click OK. You can change this or any other data in View->Options... dialog during a training session, and it will take effect immediately upon clicking OK. You can also display a color for each weight in the network by selecting the "View Node Outputs and Weights" radio button. The colormap corresponding to this display will be shown at the top of the architecture window. You can widen or narrow the range of the colormap at any time during a training session by modifying the Low and High text edit fields in the Weight Output Range portion of the View->Options... dialog. The View->Options... dialog also allows you to slow down or speed up the user interface during training. You can make the updates of the error graph more frequent in order to get more detail but doing so will cause the graph to scroll faster. Or, you can change how often the network state (weights and firing values) graphics are updated. To increase either rate, specify smaller numbers (50 will update ten times faster than 500). The faster you update either graphic, the longer it will take for your network to train because it takes time to redraw the screen each time an update occurs. The size and topology of the network will also have an effect on the time that it takes to update the graphics, so my best advice is to load a network, and experiment with different settings until you find one that works for you. Note that any changes that you make will take affect immediately, in real-time, so you can experiment with the update rates without having to restart the training. The Edit->Training Settings... dialog can be used to change parameters of the training algorithm used by backprop. It is outside the scope of this document to give detailed descriptions of each parameter, but here are some hints and observations: -- Use "Maximum training iterations" to control how many training iterations are executed before backprop gives up. The default value is probably too high for most cases, I would recommend a lower value and perhaps changing other parameters or the network architecture before attempting to give the network a long time to converge on a solution. -- Per-exemplar threshold defines how large the output error must be before the network adjusts the weights. For example, if the threshold is 0.5 and the error is 0.6, then the error is greater than the threshold and the weights in the network will be adjusted in an attempt to improve the accuracy of the network. If the error were 0.4, then the network would not be adjusted for this exemplar. If all of the errors for the exemplars are below the threshold, the network has learned the exemplars and training successfully halts. -- Momentum and learning rate are parameters that affect the training backpropagation training algorithm. Momentum causes the network to consider earlier behavior of the network in computing new weight values. Learning rate affects how rapidly weights are adjusted, and may or may not affect the ability for the network to successfully train. Usually, you will want to set the learning rate high and the momentum low, but this is only a starting point. By turning on and off these options, and changing the values, you can experiment with what works best for your network architecture and training data. -- Bias adds a trainable input to each hidden and output node in the network. The value of this input is always 1, and the weight on this input is always adjusted during training. In some cases, a bias is needed in order for the network to converge, but this is not always true. Again, refer to the literature for more guidance on the uses of bias, and experiment with backprop to see what affect it has. -- Backprop also allows you to select from two activation functions. The first, and default is sigmoid. This is by far the most popular activation function, and results in an output that is in the range of 0.0 to 1.0. If your exemplars include outputs in the range -1.0 to 1.0, then hyperbolic tangent may be a better choice, since it fires in the range of -1 to 1. Finally, backprop training starts by initializing the weights to random values. By default, this range is 0.0 to 1.0, but you can change the range to, say, -1.0 to 1.0 by using the Initial Weight Range settings in the Edit->Training Settings... dialog. Executing the Network --------------------- Once you have a trained network, you can use it to solve problems. Select Execute... from the Network menu. If you just finished training the network, the weight file will be prefilled for you. If you wish to use another weight file, type in its path or use to Browse... button to find it. You also need to specify an input file that contains the data you want the network to process. These files are named with a ".run" suffix, for example, "mynet1.run". As of 0.9.5, the file format is XML. The format is straightforward: an tag wraps a set of tags which define the network input. The tag supports two attributes. The node attribute specifies the location of the neuron in the input layer in the range [0, n - 1], where n is the number of neurons in the input layer. The value attribute specifies the input value for that neuron. For example, loading a file containing: will cause the network to set the value of the first input layer node to 0 (the value attribute of the first tag in the file), and the second input layer node to 1 (the value attribute of the second tag in the file). Clicking OK in the Execute... dialog will cause the network to process the specified file, and the graphical architecture window will display the results. Note that the input nodes will display the values that were read from the input file, and the output nodes will display an answer that should be correct for the data that was processed. If not, you might consider adding the data to the exemplar file, and retraining the network. Then, the network, assuming it trains, should have no problem processing the input. Network Architecture Language ----------------------------- The following is an example network description file for a 3 layer network that can be trained to categorize seven-segment LED inputs. A seven-segment LED is depicted in the following figure: ----1---- | | 2| 3 | | | |---4---| | | 5| 6 | | | ----7---- An seven-segment LED can be used to display the numbers 0 - 9, and many letters of the alphabet as well, and were introduced in the 1970s when electronic calculators and digital watches first hit the market, Numbers and letters are formed by lighting the individual segments. For example, you would light segments 1, 3, and 6 in order to display the number '7', like this: ----1---- | 3 | | | | 6 | | The number 4 is displayed by the device when segments 2, 3, 4, and 6 are lit: | | 2| 3 | | | |---4---| | 6 | | A neural network with 7 inputs (each input corresponding to a segment in the seven-segment LED) and 10 outputs (each output representing a number in the range [0, 9]) can be described as follows: Let's take a closer look at this. All networks are described by the "network" tag (a tag is an XML construct that has a name and is surrounded by '<' and '>' characters). The network tag can have several attributes, for example, we can specify if the network uses momentum during training with the "usemomentum" attribute, which, as shown above example, is set to the value "yes". Nested inside of the network tag in the above example are three "layer" tags. Because there are three layer tags, the network has 3 layers. Attributes of the layer tag specify the number of nodes in each layer, and the type of activation each node in the layer fires. The following is a quick reference for the tags supported in this version of Backprop. Tag: Purpose: specifies the definition of a network and its attributes Attributes: Name Type Purpose Example ------------------------------------------------------------------------------------------- usebias boolean enable or disable bias usebias="false" usemomentum boolean enable or disable momentum usemomentum="true" uselearningrate boolean enable or disable learning rate uselearningrate="false" threshold float set the update threshold threshold="0.1" momentum float set the momentum term momentum="0.9" learningrate float set the learning rate learningrate="0.3" ranlow float lower bound of weight random number ranlow="-1.0" range ranhigh float upper bound of weight random number ranhigh="1.0" range Tag: Purpose: specifies a layer and its attributes Attributes: Name Type Purpose Example ------------------------------------------------------------------------------------------- size integer the number of nodes in the layer size="7" activation text the type of activation fired by nodes in this layer. Possible values are "sigmoid" and "htan" activation="htan" Books About Backpropagation --------------------------- These are a few books I've found useful in understanding backpropagation, and neural nets in general. Author Title Publisher James A. Anderson Introduction to Neural Networks MIT Press Reed, Marks Neural Smithing MIT Press Rummelhart, McClelland Parallel Distributed Processing, Vol 1 MIT Press Known Problems -------------- None as of 0.9.5. Please e-mail requests and bug reports to me at slogan@cts.com. Planned Enhancements -------------------- -- A toolbar. I'm looking for a talented graphics artist who can do the artwork, so if you know of someone who can volunteer his or her time, please send me e-mail at slogan@cts.com. -- Eventually, Cocoa (MacOS X) and Gtk+ (Linux) versions. Modification History -------------------- 12/27/2003 0.9.5 -- Added "Save" and "Save As..." menu items. -- Added support for XML-based exemplar data (previous format is deprecated) -- Added support for XML-based weight data -- Added support for XML-based execution/input data (previous format is deprecated) -- Added ability to create networks graphically ("New" menu item). -- Added ability to modify a network architecture graphically via a popup menu the is accessible by right-clicking the mouse over a neuron. -- Fixed a bug in interpretation of activation method at layer level. -- Cleaned up enabling/disabling of menu items. -- Converted the sample exemplar and input files to XML. -- Added an Accept button to the training error dialog. Clicking this will cause training to halt and weight values to be written. Cancel, on the other hand, only causes training to halt, and no weight values are saved. 8/17/2003 0.9.4 -- Fixed scrollbar issues introduced in 0.9.3 8/5/2003 0.9.3 -- Added double buffered graphics to eliminate flicker seen in earlier releases. 6/27/2003 0.9.2 -- Added checks for overflow when computing activation functions. If overflow occurs, training will abort, and the user should change the architecture of the net, or training parameters, to avoid the problem. -- Added XML support for the network architecture file. A part of this change was to allow for per-layer specification of the activation function. For example, the user can specify the use of hyperbolic tangent activations in, say, hidden layer 2. Also, the user can now specify the training attributes (learning rate, etc.) directly in the network architecture file. 6/11/2003 0.9.1 -- Added support for controlling the update frequency of the graphical representation of the network, and the training error strip chart. -- Set Use Bias to false as default. 6/10/2003 Initial version 0.9 released. Licenses -------- The following corresponds to my use of expat in versions 0.9.3 and later: Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd and Clark Cooper Copyright (c) 2001, 2002 Expat maintainers. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.