Linear Discriminant Pattern Classifier

Dataset

To train and test out pattern classifier we need a dataset to operate on. Ideally this is real world data, though for ease of this example we generate out own dataset based on a gaussian function.

Single Layer Network

A Single layer network.


The output of the neural network in figure \ref{fig:single}, \(y\). Is the linear sum of the input value \(x_n\) and the wire weight \(w_n\) plus a \(bias\) value.
\[y_{i} = \sum_{j=1}^{j} w_i^jx_i^j + bias = WX_i + bias\]

Where:

\(j\)  the length of \(W\).

To improve on this we can combine the \(bias\) in our input weights \(W\).
\[y_i = {\hat{W}\hat{X}}_i\]

where:
\[\hat{W} = [x_1\quad x_2\quad bias] \qquad and \qquad \hat{X} = [x_1\quad x_2\quad 1]\]


Network cost function


Given a input dataset \( X = {x_1, x_2, x_3, ... x_n}\) and a target output dataset \(T = {t_1, t_2, t_3, ... t_n}\).The error for \(i\) is given by:

\[e_i = (t_i - o_i)^2\]

which is the squared difference between the input value and the target value.Over the whole network the overall cost function is

\[E_{tot} = \sum_{i=1}^i e_i = \sum_{i=1}^i (t_i - o_i)^2\]

where:

\( i\)   number of patterns in dataset.

\(o_i\) the output node value of \(i\).

\(t_i\) the target value of the datapoint \(i\)


This cost function \(E_{tot}\) gives us a good idea of how suitable the wire weights \(W\) are forthe current learning task.

Find cost function gradient

To learn the weight from the data we need to perform gradient descent of the cost function by changing the weight. This requires we know the gradient of the cost function as a function of the weight values.This is given by \(\frac{\delta e}{\delta W}\).
As we worked out earlier \(e_i = (t_i - o_i)^2\) to work out the derivative we need to use the chain rule.

\[e_i = (t_i - o_i)^2 \qquad \qquad \text{Let } u_i = (t_i - o_i)  \Rightarrow e_i = \frac{1}{2}u_i^2\]

\[ \Rightarrow \frac{\delta e_i}{\delta W} = \frac{\delta}{\delta W} \bigg[ \frac{1}{2} (u_i)^2 \bigg] \qquad  \text{From the chain: } \Rightarrow \frac{\delta e_i}{\delta W} = \frac{\delta e_i}{\delta u_i} \frac{\delta u_i}{\delta W} \]

\[ \Rightarrow \frac{\delta e_i}{\delta u_i} = u_i = (t_i - o_i) \]

\[ \Rightarrow \frac{\delta u_i}{\delta W} = - \frac{\delta o_i}{\delta W} \]    

The chain rule gives us:

\[ \Rightarrow \frac{\delta e_i}{\delta W} = - (t_i - o_i)  \frac{\delta o_i}{\delta W}\]


Now we need to do the chain rule again to get \(\frac{\delta o_i}{\delta W}\).

\[ o_i = WX_i\]
\[\frac{\delta o_i}{\delta W} = X_i^T\]
Since:
\[ \frac{\delta e_i}{\delta W} = - (t_i - o_i)  \frac{\delta o_i}{\delta W}\]
\[  \Rightarrow \frac{\delta e_i}{\delta W} = - (t_i - o_i) X_i^T\]

We use this to update the weights after a pattern.

Weight update using the delta rule

The weight update function allows us to update are weights based on the error derived in the previous section. The new weight is the old weight minus the learning rate \(\alpha\) times the error gradient.
\[ \frac{\delta e_i}{\delta W} = - (t_i - o_i) X_i^T\]
\[W \leftarrow W - \alpha \frac{\delta e}{\delta W}\]

where:

\( \alpha = \)  learning rate.


Implementing single layer network recognition

The following code snippet is of my SingleNet class which implements a single layers network.

Code Walkthrough:

  • Initialize network layer vectors (in properties)
  • Define methods
    • Constructor which given the parameters number of input and output nodes preallocates the size of the layers with zeros.
    • obj.load() A function to load data into input layer for testing.
    • obj.calc() Function which calculates product of weights and input to get the output values.

classdef SingleNet < handle
    % SingleNet A Single Layer Network Classifer

    properties
        % Init layer vectors.
        inputLayer = [];
        inputWeight = [];
        outputLayer = [];
    end

    events
        learnIteration % Event for when a learning iteration has happend.
    end

    methods

        function obj = SingleNet(input, output)
            % Constructor for creating an single layer network of N by N
            % size.

            % Preallocate the input layer to size of input.
            obj.inputLayer = zeros(input, 1);

            % Randomly init weights.
            obj.inputWeight = randn(input, output);
            obj.inputWeight(end) = 1;

             % Preallocate the output layer to size of input.
            obj.outputLayer = zeros(1, output);
        end

        function obj = load(obj, data)
            % Function allows for data to be loaded directly into
            % the input layer.
            obj.inputLayer = data;
        end

        function obj = calc(obj)
            % Function preforms calculations across network.
            sum = 0;

            % Calc sum of inputs and weights.
            for i = 1:1:max(size(obj.inputWeight)) -1
                sum = sum + ( obj.inputLayer(i) * obj.inputWeight(i)) ;
            end
            % Set output layer.
            obj.outputLayer(1) = sum + obj.inputWeight(end);

        end
    end
end

The figure below shows an implumentation of SingleNet Class.

  1. Create Single Net object.
  2. This runs the constructor which initialies layers. Load in dummy data (x_1, x_2, bias) into input layer.
  3. Use the obj.calc() function to calculate the network output.

net = SingleNet(3, 1); % Create Single net object with input size 2 and output size 1.
net.load([-1, 2, 1]); % Load in simple dummy data.
net = net.calc(); % Do network calculations.

The following is the values of the net, input, weights and output. After initialising and calculating the output of the network.


net =

  SingleNet with properties:

           inputLayer: [-1 2 1]
           inputWeight: [-1.62983878113134 0.823301361060070 1]
           outputLayer: 4.2764

Implementing single layer network training

The following code snippet shows my learning function which trains the network.Given the data vector and the target vector.

  • Weight Update is located on line 27.
  • As shown in the previous subsection weights are randomly genetated when the class is constructed. See line 21: obj.inputWeight = randn(input, output);
  • Threshold reasoning and code snippet is figure

function obj = learn(obj, data, target)

    for i = 1:1:obj.learningIterations
        result = 0;
        % Loop over each data sample, being the data and the
        % target.
        for s = 1:1:max(size(data))

            obj.inputLayer(1) = data(1, s);
            obj.inputLayer(2) = data(2, s);
            obj.inputLayer(3) = 1;

            obj = obj.calc(); % Calc output and hidden layer vals.

            % Calculate error

            for o = 1:1:max(size(obj.outputLayer))
                result = result + (target(1, s) - obj.outputLayer(o))^ 2;
            end

            for w = 1:1:max(size(obj.inputWeight))

                % Calculate the error.
                error = -(target(1, s) - obj.outputLayer(1)) * obj.inputLayer(w);

                % Update the weight based on the error gradient.
                obj.inputWeight(w) = obj.inputWeight(w) - (obj.learningRate * error);

            end

        end
        result = result / (max(size(data)) * max(size(obj.outputLayer)));

        % Set avg error.
        obj.trainRMSE = [ obj.trainRMSE ; result, obj.epoch];
        obj.epoch = obj.epoch + 1; % Increment epoch.

        notify(obj,'learnIteration'); % Trigger event
        pause(0.01); % Wait 0.01 secs.
    end
end

epoch = 1; % Init epoch count.
learningRate = 0.0001; % Learning Rate.
learningIterations = 20; % Number of learning iterations.

The following code was added to the SingleNet class properties.


% Set mean and covariance for class 0 data points.
Mean1 = [3; -1;];
Sigma1 = [0.5 0.95; 0.95 4];


% Set mean and covariance for class 1 data points.
Mean2 = [-3; -1;];
Sigma2 = [0.5 0.95; 0.95 4];

% generating training data.
trainingSamples = 1000;
global trainingData;
[trainingData, trainingTarget] = GenerateGaussianData(trainingSamples, Mean1, Sigma1, Mean2, Sigma2);


% generating testing data.
testingSamples = 1000;
[testingData, testingTarget] = GenerateGaussianData(testingSamples, Mean1, Sigma1, Mean2, Sigma2);

net = SingleNet(3, 1); % Create Single net object with input size 3 and output size 1.
net = net.learn(trainingData, trainingTarget); % Do learningIterations of training.

The threshold is \(0.5\) due to it being between the two classes. Output Threshold Code Snippet:


if output >= 0.5
	class = 1;
else
	class = 0;
end

Training the network and plotting error


Error is plotting the mean squared error (Squared error could also be used) against the epoch (Iterations of the dataset). We can test how the changing the learning rate changes the rate of which the error changes. See figure below.

Testing the network on the testing dataset

The code snippet below calculates the percentage of correct classifications. In this case the algorithm is 100%. This is because of the simplicity dataset and the range of classification.


func = net.getNetFunctionOnData();
genTargets = func(3, :);

correct = 0;
for i = 1:1:max(size(genTargets))
    if(genTargets(i)  == trainingTarget(1, i))

        correct = correct + 1;
    else
        correct = correct - 1;
    end
end

percentCorrect = (correct / max(size(genTargets))) * 100;

Testing the network on the uniform dataset

Testing using uniform dataset to show decision boundary.

Summary

Code