Optimal Training Parameters and Hidden Layer Neuron Number of Two-Layer Perceptron for Generalised Scaled Object Classification Problem

Abstract The research is focused on optimising two-layer perceptron for generalised scaled object classification problem. The optimisation criterion is minimisation of inaccuracy. The inaccuracy depends on training parameters and hidden layer neuron number. After its statistics is accumulated, minimisation is executed by a numerical search. Perceptron is optimised additionally by extra training. As it is done, the classification error percentage does not exceed 3 % in case of the worst scale distortion.


I. INTRODUCTION
A problem of classifying scaled object issues partially appears due to the impossibility to avoid distortions.On the other hand, objects may have different dimensions.Thus, they are treated as scaled ones regarding the average dimensions.Generally, this is a problem of image recognition [1], [2].Objects appear closer and farther in front of the cam.And it is difficult to adjust focal length every time.However, threedimensional objects also occur scaled [3].Scaling of multidimensional objects can be comprehended as scaled coloured images with metadata [4].
Classifiers must perform reliably through streams of scaled objects.Desired classifiers' characteristics are speed of classification, quick resetting, saving of disk space and memory.It is important to provide these characteristics along with increasing classification accuracy.

II. RELATED WORKS
Contemporary scaling-proof classifiers tend to be constructed on the basis of deep learning [5], [6].Decision trees are effectively applied when the number of object features is not great [7], [8].Except random forests [9], boosting is applicable when the number of object features is in the order of hundreds or thousands [10], [11].
In real practice, it is impossible to ensure 100 % object recognition.Even at moderate scale distortion, classification error percentage (CEP) is scarcely ever zero [1], [2], [4].At maximal scale distortion, CEP rises up to a few percent [12].Exact evaluation is troublesome because different tests use various benchmarks, and maximum distortion intensity differs badly.Besides, accuracy is evaluated diversely.Sometimes, it is CEP averaged over the whole range of the scale distortion.Another time, accuracy is put as CEP at the maximum scale distortion.This maximum is uncertain, though.
Scaling is not the worst object distortion for classifier.Shifting and rotating are worse.Therefore, the scaling-proof classifier should not be too complicated and resource-intensive like deep neural networks.According to the universal approximation theorem for feedforward neural networks [13], [14], two-layer perceptron (2LP) could be applied to classify scaled objects.Advantages of 2LP are simplicity and high speed of classification [12], [15], [16].Quick resetting and resource-saving by low CEP are believed to be reachable if 2LP is optimised by its hidden layer neuron number (HLNN) along with the training parameters [12], [17], [18].The criterion of optimisation is minimisation of CEP, which depends on the training parameters.HLNN is also to be minimised, but the two-criterion minimisation problem may turn unsolvable.Thus, HLNN will be included into a list of variables influencing CEP.To state and solve the minimisation problem, basic formalisations and assignments follow.the 2LP optimisation technique will be given.

IV. TRAINING PARAMETERS
Training sets are composed of batches of pure object representatives (PORs) and distorted (scaled) samples (DSSs).The integer R is a number of replicas of POR in the training set.The integer B is a number of batches of DSS, where each batch is the POR batch distorted at definite scale distortion intensity (SDI).
By an example of PORs which are monochrome images, training sets are corrected under a pixel-to-scale standard deviation ratio (PSSDR) introduced in [12].The correction is in importing a share of pixel distortion into DSS.This makes training sets milder for 2LP whose performance under normally distributed feature distortions (NDFD) is excellent [18], [19].PSSDR r defines the share of NDFD.

The last training parameter
Q is a number of the training set passes through 2LP.This integer might be determined by R and B .But bonds between these three integers are implicit [17], [20], [21].

V. CRITERION OF OPTIMISATION
Denote CEP by   , , , , p N R B r Q which is a function of five variables, where N is HLNN.In fact, this function is determined at some SDI that is going to be mentioned specifically.Note that only variable r is continuous and nonnegative [12], [16], [20].The remaining ones are positive integers.Hence, the function

 
, , , , p N R B r Q is to be minimised on a five-dimensional lined lattice (5DLL).

VI. FORMALISATION OF THE OPTIMISATION PROBLEM
Formally, the optimisation problem is stated as by the 5DLL may turn out to be nonsingle, and this is reflected in (1).As soon as ranges of variables are determined, the 5DLL A will be re-defined bounded above.

VII. OBJECTS FOR BENCHMARK CLASSIFICATION
Taken monochrome images as POR, we have to define their size and number of classes C .The object size is the flat image format HL  .For benchmark classification, these parameters should be varied in order to ensure validity of the 2LP optimisation technique.

C
. The image height H and length L experienced in papers [11], [12], [15], [22], [23] are taken from a medium format: 60 H and 80 L .Thus, the start-off pattern object size can be the half of the maximum, i.e., this is 30 40   .And the minimum format to be considered is 15 20   .Formats between 15 20   and 60 80  must be taken proportionally.An EACL, being MIM of POR, is scaled to maximum bad enlargement if the letter body exceeds HL  contour.This is about 50 % enlargement.Thus, it is maximum bad reduction.However, the seeming symmetry is delusive inasmuch as 50 % enlargement means enlarging by 1.5 times, but 50 % reduction means reducing twice.

VIII. MODEL OF SCALE DISTORTION
In mathematical denotation, a POR is an HL  matrix.For MIM, elements of this matrix have values 0 and 1 [15], [23].Let 1 be the value of the background colour (usually, white).For POR of the c -th class, denote its matrix by determines SDI [12], where k  is a value of standard normal distribution (SND) drawn at the k -th stage for each class separately by the scale standard deviation (SSD) The value 2) is rounded to 1 and there is no scaling effect.
Whatever SC (2) is, the scaled object is flagged by the HL  matrix

M
. This matrix is the result of the scaling map  [1], [2], [4], [12], [23]: The scaling map  in ( 4) is defined for any HL  matrix of ones and zeros by   HL  is obtained within the map (4).The matrix and columns of their numbers and (8)   for ( 5) and ( 6) are calculated by drawing independently a couple of two values  

, 
from SND, where function   x  returns the integer part of the number x [12], [23].
then the reduced image is contoured rectangularly with the background colour: the matrix   (9)   columns of ones and from the right for columns of ones, and it is padded from the top for lines of ones and from the bottom for lines of ones [12], [23].

F H L  , SDI maximum expressed by SSD
B  , number of classes C , and type of objects.Based on experience in [12], [15], [22], [23] Solving problem (13) requires statistics to evaluate CEP.Before realising it, the 2LP classifier is formalised, and formations of training sets and testing sets are described.

X. FORMALISATION OF THE 2LP CLASSIFIER
The 2LP algorithm lies in calculating the 0 c -th output neuron value [16], [20] by two real-valued matrices and vectors and the object features   1 x  , where the logarithmic sigmoid is used in N neurons of the hidden layer and in C neurons of the output layer [24], [25].After values   To get the class number (16) matching to the class of the object at the 2LP input, values (15) are adjusted while 2LP is trained [26], [27].By the way, testing sets differ from training sets because it is sufficient to evaluate CEP only at the SDI maximum [15], [23], [28], [29].

XI. TRAINING SETS AND TESTING SETS
DSSs are completed by adding NDFD: by NDFD standard deviation and FC  matrix k Ψ of values of SND drawn for the k -th batch.A training set for a pass through 2LP is by PSSDR 1 BB r     [12].A testing set is the matrix   B G corresponding to the SDI maximum.CEP   , , , , p N R B r Q is evaluated by feeding the 2LP input with 400 testing sets.

XII. STATISTICS FOR CEP
The 5DLL A must be sampled across the dimension of variable r .Each point of the sampled 5DLL A is a quintuple of 2LP parameters at which the 2LP should be trained and tested no less than five times.If the sampling step is 0.01 then the number of all 2LP to be trained and tested is over 32 million (32,016,285).Be aware of the whole evaluation process that should be exercised on an assemblage of multiprocessors, ready to be parallelised.Time required for training and testing a 2LP is up to 10 minutes.Consequently, we need to run a few thousand threads.For instance, running 500 eight-core processors takes almost two months to get the function   , , , , p N R B r Q evaluated.This, nonetheless, is realisable owing to cloud services [30], [31] or clusters [32].
for 2 R  .The dimension for R is omitted, and the fifth dimension is the training number.If data type of matrices (20) and ( 21) is double precision array, then 16440795 elements of (20) use 131526360 bytes, and 15575490 elements of (21) use 124603920 bytes.Nevertheless, single precision is sufficient for accumulating CEP statistics.Then, matrices (20) and ( 21) use 65763180 and 62301960 bytes, respectively.Grand total is 32016285 elements using less than 123 megabytes.

XIII. SOLUTION
After having accumulated the CEP statistics in matrices (20) and ( 21), the function

 
, , , , p N R B r Q is evaluated by averaging over the fifth dimension of those matrices.Minimisation is executed by a proof numerical search approach [12], [33], [34].For classifying 30   (23).The quasi-optimal HLNN for classifying 15  The first extra pass is * 1 qQ  .And let the CEP after the q-th pass be   then number t of trials to improve CEP is increased by 1.The improvement does not imply straightforward decrement of CEP, where the multiplier factor   0.01 1.05 0.1 is just a relaxation function (Fig. 1).
Fig. 1.The relaxation function (25) whose values are a multiplier factor to the temporary CEP.They show how the current CEP may increase so that this increment would be counted insignificant, and when inequality ( 24) is false we could accept 2LP   * q N P after the q -th pass as the temporary one.The limit relaxation value is 5 % that is about the testing inaccuracy.23), here, in the worst case, just one-fortieth EACL is classified wrong.Fig. 3 shows CEP for all 34  aspect ratio formats.

XIV. APPLICATION
The stated optimisation technique is easily applied to scaled object classification problems with casual aspect ratio formats.Moreover, the object does not need to have two dimensions [35], [36].Application is suggested in that solution (22) is adjusted to the arisen problem.For this, HLNN is mostly tried [18], [37], [38].For number 1200 F  it may be increased [14], especially when for the condition letting restart the extra pass control can be adjusted regarding specificities of the classification problem.Exponentially decreasing function (26) may be substituted for an appropriate decreasing curve, which is not necessary to be monotonous [39], [40].

XV. DISCUSSION
The polylines in Fig. 3 22) can be the starting point for them in searching their own respective solutions.Point (22) is anyway plausible to be the nearly closest to optimal training parameters and HLNN of 2LP aiming at classifying diversely distorted objects.
The extra passes prolong the training process greatly.For classifying 30

XVI. CONCLUSION
The stated 2LP optimisation technique lies in minimising the CEP on 5DLL and extra pass training.Preliminarily, the CEP statistics in multidimensional matrices is accumulated.The CEP is evaluated by averaging over the dimension containing different 2LPs by the same HLNN and the identical training parameters.Then, HLNN and four parameters ensuring minimal CEP are chosen, and 2LP is trained.If the trained 2LP performance is not satisfactory, it is extra trained by the algorithm in Fig. 2.
Despite statements ( 5)- (12) in the model of scale distortion relate to flat objects, the third dimension is induced analogously.Training sets and testing sets ( 17)-( 19) do not change.Hence, the scaled three-dimensional objects will be modelled and classified accurately as well.Problems of multidimensional scaling effects are going to be resolved similarly.
For generalised scaled object classification problem, it is expected that CEP for 34  aspect ratio format keeps similar.The barred graphs shown in Fig. 3 are generated the same every new testing time.At moderate SDI corresponding to SSD 0.1


, DSSs are classified without errors.For developing the 2LP optimisation technique, a 2LP training method should be parameterised.For instance, the used backpropagation method updates weight and bias values according to gradient descent with adaptive learning rate [26], [27].Its eight main internal parameters are maximum number of epochs to train, performance goal, learning rate, ratio to increase learning rate, ratio to decrease learning rate, maximum validation failures, maximum performance increase, and minimum performance gradient.These ones are usually defined empirically, so their optimised aggregate would make the 2LP classification even more accurate.Annually, he is the Head either of sections of (applied) mathematics or mathematical modelling in mathematics branch for regional school scientific competitions.He is regularly awarded by Khmelnitsky National University for scientific achievements.
E-mail: romanukevadimv@mail.ru of each 2LP, the whole statistics can be stored in two five-dimensional matrices: in matrix

Fig. 2 .
Fig. 2. The algorithm of training the 2LP with extra passes after * Q passes.The temporary 2LP   * N P trained at the q -th pass becomes the 2LP   *

Fig. 3 .
Fig. 3. CEP against SDI by SSD Subsequently, 2LP is trained with extra passes.The extra training should start right after the 47-th pass.The generalised relaxation function prove the 2LP is an effective classifier of the scaled objects.Although objects in the benchmark classification are of a few thousand features, there are no visible restrictions to use the extra trained 2LP in classifying scaled objects by any F. A demerit is the scaling distortion that has , explored.Nevertheless, the results of the present research give evidence that the optimised 2LP is capable of handling other maximum distortion types.Solution (

Vadim
Romanuke graduated from the Technological University of Podillya (Ukraine) in 2001.In 2006, he received the degree of Candidate of Technical Sciences in Mathematical Modelling and Computational Methods.The degree of Doctor of Technical Sciences in Mathematical Modelling and Computational Methods was received in 2014.He is a Professor of the Department of Applied Mathematics and Social Informatics at Khmelnitsky National University.His current research interests concern decision making, game theory, statistical approximation, and control engineering based on statistical correspondence.He has 278 published scientific articles and one tutorial. ,