Alexei ChernobrovovConsultant on Analytics and Data Monetization

How to cheat a neural network or what is an Adversarial Attack

This article discusses what is Adversarial Machine Learning, what makes it dangerous, where it is used, and how it differs from generative-adversarial neural networks. Read on to find out when and why AML methods appeared, how to trick facial recognition and what are the ways to fight Adversarial attacks.

What is Adversarial Machine Learning: definition and history

Usually the term Adversarial Machine Learning (AML) is translated into Russian as " competitive machine learning", implying a purposeful impact on a neural network that can cause errors in its behavior. Therefore, it is more correct to use the meaning of "Adversarial ML", which excludes the mixing up with Generative adversarial networks (GAN) and emphasizes the negative nature of this notion. The first publications on the topic of AML date back to 2004. Until about 2015, when ML was not widespread in practice, AML was also theoretical in nature [1]. And in 2013, Christian Szegedy of Google AI, trying to understand how neural networks "think", discovered a common property of this ML method: they are easily fooled by small perturbations [2]. Then the ideas of AML gain popularity after the article "Explaining and harnessing adversarial examples" by famous DS specialists Ian J. Goodfellow, Jonathon Shlens and Christian Szegedy, published in 2015 [3].

A classic illustration of AML adversarial attack is an example mentioned in this paper, where a special noise, invisible to humans but detectable by a neural network, is added to the original panda image, which is recognized with 57.7% probability. As a result, the neural network identifies the image as a gibbon image with 99.3% probability (Fig. 1) [3].

Figure 1. The AML attack principle against ML systems of image recognition


In 2017, Massachusetts Institute of Technology (MIT) researchers printed a 3D model of a toy turtle with such texture that the Google AI object detection tool classified it as a rifle. And in 2018, Google Brain published a processed image of a dog that looked like a cat, for both computers and humans [4].

The full history of AML from its origins to the present can be found in the article "Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning" by Italian researchers Battista Biggioa and Fabio Rolia. Of course, the scope of AML is not limited to machine vision and image identification, it also includes the tasks of text, sound and biometric recognition. In particular, this is how you can bypass the Face ID function in the iPhone X smartphone or other similar devices [1]. We will consider the legal uses of Adversarial Machine Learning next, and now let's talk about the types of attacks on neural network models that can be implemented using this method.


Adversarial Attacks types and cases

So, one of the reasons why adversarial attacks appear is that Machine Learning methods were originally developed for stationary and secure environments where training and test samples are generated from the same statistical distribution. However, in practice, attackers can surreptitiously manipulate input data to exploit vulnerabilities in ML algorithms and compromise the security of the entire Machine Learning system. There are 2 types of AML attacks [4]:

  • Evasion, in which the attacker tries to cause inadequate behavior of an already ready product with an ML model built into it. In this case, the product itself is seen by the attacker as a black box, without detailed knowledge of the characteristics and devices. This type of attack is considered the most common. For example, spammers and hackers often try to evade detection by hiding the contents of spam emails and malicious code. In particular, this includes image-based spam, where malicious content is embedded in an attached image to avoid the text analysis performed by email spam filters. Another case is spoofing attacks on biometric systems, where the attacker aims to disguise himself as another person.
  • Poisoning, where an attacker seeks to gain access to the data and learning process of an ML model in order to "poison" it for subsequent inadequate operation. Poisoning can be thought of as maliciously contaminating the training data. Thus, a "white box" strategy is used here, when an attacker has information about the victim - the Adversarial Knowledge: how the data for learning is developed and from what sources it is taken, what are the main functions of the attacked system, the algorithms it works with, what are the results, etc. Poisoning attacks presuppose insider information about the ML system and a sufficiently high level of competence of the attacker in Data Science.


In addition to victim awareness, the main factors that determine the type of attack on supervised ML algorithms are considered to be [4]:

  • Impact on the classifier, for example, if the attack aims to introduce vulnerabilities at the classification stage through manipulation of training data or finding and subsequently exploiting vulnerabilities;
  • Security breach, in particular integrity, when malicious samples are misclassified as legitimate, or accessibility, if the goal of the attacker is to increase misclassification of legitimate samples, rendering the classifier unusable;
  • The specifics of the attack are targeted or indiscriminate. A targeted attack engages specific samples to allow a specific intrusion, such as passing through the spam filter of a particular email.


In addition to spam emails and deception of biometric systems, one of the most striking cases of Adversarial attack is the impact on self-driving cars and other robotic solutions. For example, to analyze the behavior of the car's machine vision subsystem, slightly altered images of road signs are fed to its input in huge quantities. Princeton University researchers' experiments have shown that it is enough to apply simple distortions to a speed limit sign so that the ML system considers it to be a mandatory stop sign. And sudden braking of one car in a dense stream of cars moving at high speed is fraught with the risk of accidents and even fatalities. Given this potential vulnerability, many self-driving cars companies have responded by releasing technologies to prevent Adversarial attacks. In particular, in 2018, Nvidia Corporation, which collaborates with Mercedes-Benz, published a SELF-DRIVING SAFETY REPORT describing attached infrastructure solutions to protect self-driving cars. Aircraft manufacturers suggest extending to vehicles technologies like Communication Lockdown, which the F-35I and F-16I fighters are equipped with. Nevertheless, in the field of unmanned vehicles there are not yet ready-made solutions to counter distorting attacks, so these potential threats remain the most important factor that hinders the practical implementation of autonomous vehicles in everyday life [1].


Another example of an AML attack is the vulnerability of clustering algorithms, which are used to detect dangerous or illegal activities. For example, the clustering of malware and computer viruses aims to identify, classify, and create specific signatures for detection by antivirus or intrusion detection systems. However, these algorithms were not originally designed to deal with deliberate attack attempts that could disrupt the clustering process itself [4].


How to protect on AML attacks: neural network learning approaches and security assessment


The high risks of AML attacks have led to proposals to rethink approaches to training neural networks. In 2019, researchers at MIT showed that the seemingly random noise that confuses the ML model actually engages very precise, barely noticeable patterns associated with specific objects. This means that the ML system doesn't just identify a gibbon where a human sees a panda, but reveals a pattern of pixels, unnoticeable to humans, that appeared more often in pictures of gibbons than in images of pandas during training. The researchers conducted an experiment by creating a dataset with images of dogs that had been altered so that the standard image classifier mistakenly identified them as cats. They then labeled these images "cats" and used them to train a new neural network. The trained ML model was then presented with real images of cats, and it identified them correctly. Therefore, the experimenters assumed that there are two types of correlations in each dataset [5]:

  • Conceptual patterns that correlate with the meaning of the data (pointed ears of cats, specific fur color of pandas);
  • Pixel patterns that exist in the training data but do not extend to other contexts, such as stickers, background images, and other visual noise. These are the factors that are used in AML attacks.


So to reduce the risk of Adversarial attack, we should change the way ML models are trained by controlling the correlation patterns that the neural network uses to identify objects in the image. This can be done by training on contextual patterns that are related to the very meaning of the object being identified. Having tested this idea using only real correlations for training ML models, the researchers from MIT had encouraging results: the neural network was attackable only in 50% of cases, while the model trained on real and false correlations was manipulable in 95% of cases [5].

However, improving the quality of the training sample is not enough to completely eliminate the risks of Adversarial attack. To understand the security level of ML algorithms, a comprehensive approach is needed, including the following measures [4]:

  • Identifying potential vulnerabilities in machine learning algorithms during training and classification;
  • Identifying relevant attacks that match the identified threats and assessing their impact on the target ML system;
  • Implementation of methods to counter potential attacks.


This set of activities is considered a proactive arms race, in which the developer tries to anticipate the intentions and actions of the adversary in order to eliminate potential vulnerabilities in advance. There is also a reactive arms race, in which the developer analyzes an attack that has already occurred and counteracts it (Fig. 2).

Figure 2. Reactive and proactive arms race in AML attacks


Currently, in the field of applied Data Science the following mechanisms of protection against Adversarial attack become popular [4]:

  • Defining secure learning algorithms;
  • The use of multiple classifier systems;
  • Privacy-preserving learning;
  • Game-theoretic AML models, including data mining;
  • Purifying the training sample from adversarial attacks.


There are also methods of empirical defense against AML attacks, the efficiency of which is tested and confirmed in practice [2]:

  • Adversarial training (AT), where during training the ML model is retrained with examples included in the training dataset but labeled correctly. This learns the model to ignore noise and learn only from reliable features. This protects the model only from the attacks that were used to create the examples included in the training sample. If you use another algorithm or an adaptive attack (the so-called "white box"), you can re-train the neural network classifier, as in the absence of any protection. If you keep retraining the model by including more and more fake samples, at some point there will be so much fake data in the training dataset that the model will become useless. However, if the goal is to make it harder for an attacker to cheat the neural network classifier, AT is an excellent choice, which can be implemented through ensemble or cascade methods, as well as through robust optimization or spectral normalization.
  • Gradient masking, which does not work as a defense, but provides noise reduction. Due to the portability property of AML samples, even if one hides the model gradients, an attacker can still build a surrogate, attack it, and then transmit poisoned training data examples. Input data modification occurs when it is cleaned up before being transmitted to the ML model to get rid of extraneous noise, such as with autocoders, color depth reduction, anti-aliasing, GAN-conversion, JPEG compression and other transformations.
  • Detection and Input modification, in which the input data after cleaning is compared to the original data, if the results are too different, then there is a high probability of changing the input dataset, that is, the possibility of an attack. There are a number of detection techniques that check the raw statistics computed at various points in the input process, e.g., at the input itself, during convolutional filtering, etc. Importantly, the modification and detection methods can be applied to an already trained ML model.
  • Extra class in which the classifiers are trained to a certain distribution of data within certain limits, refraining from identifying unknown classes.


Some of the above methods are available in the following AML libraries [4]:

  • AdversariaLib, which includes an implementation of Evasion attacks;
  • AdLib- a Python library with a scikit-style interface, which includes the implementation of evasion attacks and defenses against them;
  • AlfaSVMLib- attacks on reference vector machines and clustering algorithms;
  • deep-pwning- Metasploit for deep learning, which currently attacks deep neural networks using Tensorflow;
  • Clever Hans- Tensorflow-library for testing Deep Learning models against known AML attacks;
  • foolbox- A Python library for creating AML patterns that implements several attacks.


Where AML is applying: some practical cases

The following are considered to be classic cases of Adversarial attack techniques [4]:

  • Spam filtering, where spam messages are masked by misspelling stop words or inserting alternatives;
  • Computer security attacks, such as the obfuscation of malicious code in network packets or confusion in detecting virus signatures;
  • Attacks on biometric systems, where spoofed biometric characteristics are used by attackers as a spoof for a legitimate user.


However, the use of AML attacks in cybersecurity [6] is not the only example of the applied use of this technology. In particular, in 2019, Avito used this approach to deal with content theft by protecting car ads by applying a special noise mask to their photos. It used object detection by iteratively adding images other than the true class to the background of the company's logo placed on the license plate of the car (Figure 3). Noise for the letters of the Avito logo was obtained using the fgsm method, and the scalable MXNet platform for deep neural network training and deployment was used in development. Thus, the ad service prevented automatic copying of its content to the competitors' website [7].

Figure 3. Image protection on Avito


Another interesting case is the Machine Vision competition (Machines Can See), which required changing people's faces so that the convolutional neural network (the black box from the facilitators), could not distinguish the source face from the target face (Fig. 4). The following methods were used: Fast Gradient Sign Method (FGSM), Fast Gradient Value Method (FGVM), Genetic differential evolution, pixel attacks, Ensembles models with several ResNet34 and smart bypass of target image combinations [8].

Figure 4. Adversarial attack on human faces pictures


A similar generation of human faces is described in an article about manipulation of biometric images, where the study of ways to deceive biometric systems will improve the effectiveness of combating such attacks. For example, experimental studies have shown that the face recognition model prediction is directly dependent on the ML model parameters and the location of anchor points on the input image. Manipulations with them by means of the prediction gradient with respect to the input image strongly affect the predictions of the face recognition classifier (Fig. 5) [9].

Figure 5. Generated samples of human faces using Adversarial attack methods


However, the manipulation of reference points is not only the prerogative of ML specialists. Due to the active introduction of video surveillance systems with facial recognition, more and more urban activists are trying to cheat computer biometrics. According to a study by the Russian company Videomax, false mustaches, beards, dark or transparent glasses cannot fool ML-algorithms, and a voluminous wig reduces the identification accuracy by almost half. The combined use of a long-haired wig, headgear, patches and simulated facial bruises reduced recognition accuracy by up to 51%. People also use special makeup, which disrupts facial symmetry and affects the identification of reference points. A similar method of camouflage makeup is the basis of the service Grigory Bakunov, director of technology distribution at Yandex. An algorithm based on the original photo selects a new image for a person based on the "anti-similarity" principle (Fig. 6). However, Grigory quickly enough closed this project, considering it a potential weapon for criminals who could use the service for illegitimate purposes [10].

Figure 6. Examples of camouflage makeup that helps protect against facial recognition systems


However, camouflage makeup will only work in the case of systems that focus on analyzing flat images with conventional daylight cameras. In the case of infrared video surveillance devices, makeup will not help, since they reflect infrared rays off the person and create a three-dimensional map of the entire face. In particular, FaceID technology recognizes the face in any makeup. However, researchers at Fudan University in China, the Chinese University of Hong Kong and Indiana University commissioned by Alibaba Corporation have developed infrared LEDs that attach to a cap and illuminate the human face. This allows you to hide, even from infrared cameras. And the LEDs help not only hide your face, but also pretend to be another person, highlighting the right reference points. The experiment showed that the cameras were able to fool in 70% of cases. A similar solution was presented by the Tokyo National Institute of Informatics in the form of glasses with built-in infrared LEDs that illuminate a person's eyes and nose. However, they will not be able to fool cameras, which record visible rather than infrared light [10].  In addition, the question about the usability of such an accessory remains open: the human head is very sensitive to bright light and you can not wear it for a long time.

Figure 7. LEDs that trick facial recognition systems


Some activists create special accessories, such as the photorealistic 3D mask by the American artist Leonardo Selvaggio or the Polish designer Ewa Nowak, which are also focused on fooling urban surveillance systems with facial recognition (Fig. 8) [10]. However, progress does not stand still and such a protest movement can be seen as a kind of Adversarial attack on ML recognition algorithms, which ultimately act as a driver for the further development of this technology.

Figure 8. Leonardo Selvaggio's 3D mask and Ava Novak's accessories to cheat facial recognition systems


To sum up

Therefore, given the growing interest in Deep Learning, the spread of biometric systems in particular and ML in general, as well as the popularization of autonomous machines (robots, cars, drones), we can conclude that the AML problem will be relevant for a long time to come. Therefore, when preparing data for modeling and developing their own neural network models, Data Scientist should assess their vulnerability to possible attacks, taking appropriate countermeasures.