Step 7:  

Insert a batch normalizatoin layer right after the convolution layer.  This is not essential, but it will help reduce the sensitivity of the network to variations in the data (for example, if some are written larger than others, or if they are written at an angle).

clear all;
close all;
clc;


[XTrain,YTrain] = digitTrain4DArrayData;
size(XTrain)      %images
size(YTrain)      %correct answer labels


XTrain=1-XTrain;  % Reverse the black and white colors.  Save and run the program to see the difference.

perm = randperm(size(XTrain,4),20);  % Randomize the order of images in XTrain
for i = 1:20
subplot(4,5,i);
imshow(XTrain(:,:,:,perm(i)));
end
 

layers = [
imageInputLayer([28 28 1])   
convolution2dLayer(3,8,'Padding','same')
batchNormalizationLayer
reluLayer
maxPooling2dLayer(2,'Stride',2)