1. Gradient free stochastic training of ANNs, with local approximation in partitions.
- Author
-
Bakas, N. P., Langousis, A., Nicolaou, M. A., and Chatzichristofis, S. A.
- Subjects
PARTIAL differential equations ,PYTHON programming language ,ARTIFICIAL neural networks ,COMPUTER programming ,DATABASES ,MACHINE learning ,SET functions - Abstract
We present a numerical scheme for computation of Artificial Neural Networks (ANN) weights, which stems from the Universal Approximation Theorem, avoiding costly iterations. The proposed algorithm adheres to the underlying theory, is highly fast, and results in remarkably low errors when applied to regression and classification problems of complex data sets with x ∈ R n (e.g. Griewank, Gomez-Levy, Shekel, and Polynomial functions) with random noise addition (i.e. Uniform, Normal, Generalized Pareto, Log-Normal, and a mixture of Log-Normal, Exponential, and Frechet), as well as the database for handwritten digits recognition MNIST (Modified National Institute of Standards and Technology) with 7 × 10 4 images. The same mathematical formulation was found capable of approximating highly nonlinear functions in multiple dimensions, with low errors (e.g. 10 - 10 ) for the test set of the unknown functions, their higher-order partial derivatives, as well as numerically solving Partial Differential Equations, such as those appearing in Physics, Engineering, Environmental Sciences, etc. The method is based on the calculation of the weights of each neuron in small neighbourhoods of the data. Accordingly, optimization of hyperparameters is not necessary, as the number of neurons stems directly from the dimensionality of the data, further improving the algorithmic speed. Under this setting, overfitting is inherently avoided, and the results are interpretable and reproducible. The complexity of the proposed algorithm is of class P with O (m N n i cl + N m n 2 + N n 3 + m N 2 + N 3) computing time, with respect to the observations m, features n, and Neurons N, contrary to the NP-Complete class of standard algorithms for ANN training. The performance of the method is high, irrespective of the size of the data set, and the test set errors are similar or smaller than the training errors, indicating the generalization efficiency of the algorithm. A supplementary computer code in Julia and Python Languages is provided, which can be used to reproduce the validation examples, and/or apply the algorithm to other data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF