In this article, we will consider one of the most popular concepts today - neural networks, which, in fact, have become the “face” of modern artificial intelligence. How this Machine Learning method works, why it became so popular in the 21st century, what deep neural networks are and what to expect from them in the future - read in this article.
First of all, we note that neural networks are not the only methods of machine learning and artificial intelligence. In addition to the neural networks in the training class with the teacher, the methods of correction and back propagation of error (backpropagation), as well as the support vector machine (SVM, Support Vector Machine) are highlighted, the application of which in the single-class classification problem I described here. Among the ML methods, there are teaching without a teacher (alpha and gamma systems of reinforcement, the method of nearest neighbors), training with reinforcement (genetic algorithms), partial, active, transinductive, multitask and multivariate training, as well as boosting and Bayesian algorithms [1 ].
However, today it is neural networks that are considered the most common ML-tools. In particular, they are widely used in image recognition, classification, clustering, forecasting, optimization, management decision-making, data analysis and compression in various application areas: from medicine to economics.
The history of neural networks in IT begins in the 40s of the last century, when American scientists McCulloch, Pitts and Wiener described this concept in their writings on the logical calculus of ideas, nervous activity, and cybernetics with the goal of representing complex biological processes in the form of mathematical models.
It is worth mentioning the theoretical basis of neural networks in the form of the Kolmogorov-Arnold theorem on the representability of continuous functions of several variables by a superposition of continuous functions of one variable. This theorem was proved by Soviet scientists A.N. Kolmogorov and V.V. Arnold in 1957, and in 1987, was transferred by the American researcher Hecht – Nielsen for neural networks. It shows that any function of many variables of a sufficiently general form can be represented using a two-layer neural network with direct full connections of neurons of the input layer with neurons of the hidden layer with previously known limited activation functions (e.g., sigmoidal) and output layer neurons with unknown activation functions.
From this theorem, it follows that for any function of many variables there exists a neural network that maps to it of a fixed dimension, and three parameters can be used in training it :
Another key milestone in the history of neural networks was the invention of the percentron by Frank Rosenblatt in the 60s. XX century. Due to the successful results of using perceptrons in a limited range of tasks (weather forecasting, classification, handwriting recognition), neural networks have become very popular among scientists around the world. For example, in the USSR, scientists of the scientific school of M. M. Bongard and A. P. were engaged in neural networks at the Institute of Information Transmission Problems. Petrova (1963-1965 gg.). However, since the computing power that existed at that time could not effectively implement theoretical algorithms in practice, the research enthusiasm for these ML methods temporarily fell.
The next wave of interest to neural networks began 20 years later, in the 80s of the last century and, in fact, continues to this day. It is worth noting here the various variations of the Kohonen and Hopfield networks that have developed in the deep learning model — trends, which we will discuss in more detail later .
Let's start with the classical definition: an artificial neural network is a mathematical model with software or hardware implementation, built on the principle of the organization and functioning of biological neural networks - nerve cells of a living organism. A neural network is a system of connected and interacting simple processors (artificial neurons), where each of them works only with signals that it periodically receives and sends itself to other neurons .
A characteristic difference of neural networks from other computational models is their orientation on biological principles, so they have the following qualities:
The above properties allow the use of neural networks for solving complex problems with incomplete input data or the absence of clearly defined algorithmic rules, for example, for forecasting (weather, exchange rates, emergencies, etc.), pattern recognition (images, video and audio streams), classification, management decision making, optimization, data analysis, etc. It is noteworthy that neural networks can be used in almost any field of industry and art, from the oil and gas sector to music.
As we noted above, the rules for the operation of neural networks are not programmed, but are developed in the learning process, which ensures the adaptability of this ML-model to changes in input signals and noise. Technically, training a neural network consists in finding the coefficients of connections between neurons, while the network is able to identify complex relationships between input and output, as well as perform generalization .
A typical neural network consists of three components (Fig. 1):
Each neuron of the previous node transmits signals to the neurons of the subsequent one by the method of direct or reverse propagation of the error through synaptic connections with weighting factors (Fig. 2). Figure 2 shows a circuit of an artificial neuron, where
Mathematicall neuron is an adder, the result of which y = f (u) is determined through its inputs and weights :
Thus, the output of each neuron is the result of its nonlinear continuous and monotonic activation function: sigmoid, sinusoid, Gaussian, step, and the like (Fig. 3). The activation function determines the dependence of the signal at the output of the neuron on the weighted sum of signals at its inputs. Due to its nonlinearity, neural networks with a fairly small number of nodes and layers can solve rather complex problems of forecasting, recognition and classification. Various activation functions differ from each other in the following characteristics :
As a rule, for each type of problem being solved and the network topology there is its own activation function. For example, in multilayer perceptrons, a sigmoid is used in the form of a hyperbolic tangent, which is well normalized, amplifying weak signals and not going away from an infinite increase from strong ones. In radial basis networks, the most commonly used are Gaussian, multiquadratic or vice versa multiquadratic activation functions, the parameters of which allow you to adjust the divergence of the radius by adjusting the scope of each neuron .
Artificial neurons are combined in networks with different topologies, depending on the problem being solved (Fig. 4). For example, perceptrons and convolutional neural networks (training with a teacher), adaptive resonance networks (without a teacher) and radial basis functions (blended learning) are often used for pattern recognition. For data analysis, Kohonen networks (a self-organizing map and networks of vector quantization of signals) are used. Also, the nature of the training dataset affects the choice of network type. In particular, when forecasting time series, an expert assessment is already contained in the source data and can be distinguished during its processing, therefore, in these cases, a multilayer perceptron or a Word network can be used 
So, the current neural network technologies have been developing especially actively since the 2000s, when the power of graphic processors became sufficient for quick and inexpensive training of neural networks, and a large number of training datasets have accumulated in the world. For example, until 2010, there was no database to properly train neural networks to solve the problems of recognition and classification of images. Therefore, neural networks were often mistaken in confusing a cat with a dog, or a snapshot of a healthy organ with a patient. However, with the advent of ImageNet in 2010, which contained 15 million images in 22 thousand categories and was available to any researcher, the quality of the results improved significantly. In addition, by this time, new achievements of scientists in the field of artificial intelligence had appeared: Jeffrey Hinton implemented pre-training of the network using the Boltzmann machine, training each layer separately. Ian LeCan suggested using convolutional neural networks for image recognition, while Joshua Benjio developed a cascading auto-encoder that allowed all layers to be used in a deep neural network . It is these studies that formed the basis of modern trends in the development of neural network technologies, the most significant of which are the following:
Next, we will examine in more detail the methods of deep and automatic ML.
Deep learning includes a class of ML-models based on learning by representations (feature / representation learning), and not on specialized algorithms for specific tasks. In DL, multilayer neural networks play the role of a multilevel system of non-linear filters for feature extraction. DL models are characterized by a hierarchical combination of several learning algorithms: with a teacher, without a teacher, with reinforcement. The architecture of the neural network and the composition of its nonlinear layers depends on the problem being solved. In this case, hidden variables are used, organized in layers, for example, nodes in a deep network of trust and a deep limited Boltzmann machine. In this case, regardless of architectural features and applications, the entire DL-networks are characterized by preliminary training on a large amount of general data with subsequent adjustment on datasets specific to a particular application .
For example, one of the most famous implementations of DL-models, the BERT neural network, which I talked about here, is pre-trained on Wikipedia articles, and then used in text recognition. According to a similar principle, the XLNet neural network is also used in natural language processing for text analysis and generation, data extraction, information retrieval, speech synthesis, machine translation, automatic abstracting, annotation, simplification of text information and other NLP problems . Another deep neural network, CQM (Calibrated Quantum Mesh), also shows excellent results (more than 95%) in understanding natural language, extracting the meaning of a word based on a probabilistic analysis of its environment (context) without using predetermined keywords . Figure 5 shows the use of a previously prepared model as objects in a separate downlink DL network during transfer training in NLP problems .
Among other DL-models, capsular networks are worth mentioning, which, unlike convolutional networks, process visual images taking into account the spatial hierarchy between simple and complex objects, which increases the accuracy of classification and reduces the amount of data for training . We also note deep learning with reinforcement (DRL, Deep Reinforcement Learning), working on the principle of interaction of the neural network with the environment through observations, actions, fines and rewards . DRL is considered the most universal of all ML methods, so it can be used in most business applications. In particular, the AlphaGo neural network refers to DRL models, which in 2015 defeated a person for the first time in competitions in the ancient Chinese game of go, and in 2017 defeated the strongest professional player in the world .
And for the recognition of images, obtaining photorealistic images, improving the quality of visual information and ensuring cybersecurity, GAN-like networks are actively used. A GAN network is a combination of two competing neural networks, one of which (G, generator) generates samples ， and the other (D, Discriminator) tries to distinguish between correct ("genuine") samples from incorrect ones, processing all the data. Over time, each network improves, so the quality of data processing increases significantly, because the training process has already incorporated the function of interference processing. This non-teacher DL learning system was first described by Google's Goodfellow from Google in 2014, and the idea of competitive learning was put forward in 2013 by scientists Li, Gauci, and Gross. Today GAN-networks are actively used in the video industry and design (for training film or animation frames, computer game scenes, creating photorealistic images), as well as in the space industry and astronomy (for improving images obtained from astronomical observations) . Thus, GAN networks are excellent for a wide range of teacher-less training tasks where labeled data does not exist or the process of preparing them is too expensive, for example, creating a 3D image of a remote object based on its fragmentary images. Thanks to a competitive approach, this ML-method is faster than similar DL-models, because two networks are used at once with opposite local goals aimed at a common global result .
AutoML's value lies in the “democratization" of machine learning: this technology allows business users, not just experienced Data Scientists, to work with powerful analytics and forecasting tools. AutoML tools are aimed at simplifying the creation and application of complex algorithms as much as possible: thanks to simplified user interfaces, they allow you to quickly create the necessary models, reducing the likelihood of erroneous calculations. AutoML-systems are designed to process a large amount of data, including preliminary preparation of datasets: users can independently identify tags, select the necessary sections of the information set in UI mode. This approach is significantly different from the traditional work of Data Scientist’s, allowing you to focus on a business task, rather than data processing issues. Many ML platforms are compatible with Android and iOS, so the models can be seamlessly and quickly integrated with mobile applications . Among the most famous AutoML-solutions it is worth noting the following:
A brief summary of the position of neural networks in modern Data Science:
So, the most obvious of modern ML-trends are deep learning neural networks and AutoML tools, which ensure the growth of popularity of DS, including among ordinary people, which, in turn, leads to the expansion of applied use and the further development of all AI methods.