Step 11:
The
function trainNetwork is used to train deep layer networks, especially
CNNs. The various options for trainNetwork (for example, how many
epochs of training to use) can be set using trainOptions. This is
shown below.
clear all;
close all;
clc;
[XTrain,YTrain] = digitTrain4DArrayData;
size(XTrain) %images
size(YTrain) %correct answer labels
XTrain=1-XTrain; % Reverse the black and white colors. Save and run the program to see the difference.
perm = randperm(size(XTrain,4),20); % Randomize the order of images in XTrain
for i = 1:20
subplot(4,5,i);
imshow(XTrain(:,:,:,perm(i)));
end
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)
convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
averagePooling2dLayer(7)
fullyConnectedLayer(10) % 10 output layer nodes
softmaxLayer
classificationLayer]; %close the bracket
options = trainingOptions('sgdm', ...
'InitialLearnRate',0.1, ...
'MaxEpochs',20, ...
'Verbose',false, ...
'Plots','training-progress', ...
'Shuffle','every-epoch' );
trainingOptions
allows us to set some parameters for training. This is then saved
in a variable we created called options. This 'options' variable
will be passed on to the trainNetwork function in the next step.
The
first line of options shows that we are asking the network to use sgdm
as a training method. sgdm stands for stoichastic gradient
descent with momentum. This is the method we have discussed
previously. Other training methods (optimization methods) can
also be used. In the command window, type help trainingOptions to
see the 3 training methods available.
The second line
sets the learning rate, alpha, to 0.1. The larger this rate the
faster the network learns, but if it's too larg, the network may not
have a high enough resolution to properly find the solution.
The third line tells the network to use 20 epochs. This is enough since we have 5000 training trials per epoch.
The
next line tells matlab not to print the results to the command window
screen during training. This is not necessary since we are
plotting the progress in a figure window. You can set the value
of Verbose to "true" if you'd like to see the results printed to the
command window.
The next line, 'Plots","training progress'
will create a figure window and show the training in progress.
This is why Verbose is set to false. We don't also need to see
the training in the command window.
The last line,
'Shuffle','every-epoch' means that for each epoch we randomize
the order of the 5000 images so on each epoch the order in which the
network is trained is different. This helps reduce the
possibility of overfitting the network to a specific order of images.