Step 11:  

The function trainNetwork is used to train deep layer networks, especially CNNs.  The various options for trainNetwork (for example, how many epochs of training to use) can be set using trainOptions.  This is shown below.

clear all;
close all;
clc;


[XTrain,YTrain] = digitTrain4DArrayData;
size(XTrain)      %images
size(YTrain)      %correct answer labels


XTrain=1-XTrain;  % Reverse the black and white colors.  Save and run the program to see the difference.

perm = randperm(size(XTrain,4),20);  % Randomize the order of images in XTrain
for i = 1:20
subplot(4,5,i);
imshow(XTrain(:,:,:,perm(i)));
end
 

layers = [
imageInputLayer([28 28 1])   
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)

convolution2dLayer(3,16,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)

convolution2dLayer(3,32,'Padding','same')
batchNormalizationLayer
reluLayer
averagePooling2dLayer(7)
 
fullyConnectedLayer(10)     % 10 output layer nodes
softmaxLayer
classificationLayer];  %close the bracket

options = trainingOptions('sgdm', ...
'InitialLearnRate',0.1, ...
'MaxEpochs',20, ...
'Verbose',false, ...
'Plots','training-progress', ...
'Shuffle','every-epoch' );

trainingOptions allows us to set some parameters for training.  This is then saved in a variable we created called options.  This 'options' variable will be passed on to the trainNetwork function in the next step.
 
The first line of options shows that we are asking the network to use sgdm as a training method.  sgdm stands for stoichastic gradient descent with momentum.  This is the method we have discussed previously.  Other training methods (optimization methods) can also be used.  In the command window, type help trainingOptions to see the 3 training methods available.  

The second line sets the learning rate, alpha, to 0.1.  The larger this rate the faster the network learns, but if it's too larg, the network may not have a high enough resolution to properly find the solution.

The third line tells the network to use 20 epochs.  This is enough since we have 5000 training trials per epoch.

The next line tells matlab not to print the results to the command window screen during training.  This is not necessary since we are plotting the progress in a figure window.  You can set the value of Verbose to "true" if you'd like to see the results printed to the command window.

The next line, 'Plots","training progress'  will create a figure window and show the training in progress.  This is why Verbose is set to false. We don't also need to see the training in the command window.

The last line, 'Shuffle','every-epoch'  means that for each epoch we randomize the order of the 5000 images so on each epoch the order in which the network is trained is different.  This helps reduce the possibility of overfitting the network to a specific order of images.