Bengaluru, Karnataka, INDIA 560042
+91-9784367546, +91-8839669785


A Real World Adversary Labs



The world has become digitalized; every technology company is utilizing machine learning models to make automated, faster, accurate, and efficient decisions in their everyday work. While machine learning has made it into the security field; being used in the fields of intrusion detection, spam filtering, and malware classification– the important application for machine learning in its ongoing fight against cybercriminals and the underlying algorithms have received little attention which is raising serious security concerns and the dire need for advanced security technologies to combat the increasing cyber-attacks. This article explains how machine learning is utilized in cyber-security, the various attacks against machine learning systems, and how they could be utilized for malicious purposes.


Artificial Intelligence is explicated as a program that evinces cognitive ability similar to that of a human being. Making computers ponder like humans and decode problems the way we do is one dogma of artificial intelligence.AI is an umbrella term that incorporates within it, the sub-fields of machine learning and deep learning.

Machine learning is a sub-field of artificial intelligence (AI) that aims to provide systems with the ability to automatically learn and improve from experience, while not being explicitly programmed. It depends on mathematical models, which are derived from analyzing patterns in datasets and then used to make predictions on new input data.

Deep Learning can be defined as a specialized version of machine learning that employs more complex methods for difficult problems.

Machine Learning has now become the go-to way for companies to resolve a bevy of issues. There are diverse applications of Machine Learning in each sphere. In e-commerce, machine learning models are used to make recommendations based on client’s behavior and preference, and in health care, it is employed to predict epidemics or the probability of a patient having certain diseases, based on their previous medical records. One particular field which is showing wide usage of machine learning is that of cybercrime and security, such as in malware and log analysis. Adoption of machine learning models is well-documented, with different enterprises espousing machine learning at scale across verticals.


Supervised or Predictive Learning can be classified as the one, where learning is guided by a teacher. The data-set acts as a guide whose task is to up-skill the model and as it gets trained, it can start making a prediction or decision when new input data is given to it. In this, there is always a target variable, the value of which the ML model learns to predict using varied learning algorithms (linear and logistic regression, decision tree, support vector machine Naive Bayes, and, K-Nearest Neighbours ) E.g., on the basis of IP address location, times and frequencies of Web requests, the ML model can predict if a given IP address was part of a DDOS attack.

Unsupervised Learning or Pattern Discovery is the type of learning in which the model learns through observation and by finding structures in the data. When the model is given a dataset, it involuntarily creates clusters and divides the dataset into those clusters by finding interesting associations, patterns, and establishing relationships in the dataset. The algorithms utilized in Unsupervised Learning include K-means clustering, DBSCAN, Mean-Shift, Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Latent Dirichlet allocation (LDA), Latent Semantic Analysis, and, FP-growth. E.g., identifying computer programs, such as malware with similar operating/ behavioral patterns using clustering and association algorithms.

Reinforcement Learning takes inspiration from how human beings learn from data in their lives, their ability to interact with the environment and find out the best outcome, based on the concept of hit and trial method. Reinforcement learning works by putting the algorithm in a work environment with an interpreter and a reward system, the output result is given to the interpreter, which decides whether the outcome is rewarded or penalized, and on this basis, the model trains itself. The algorithms utilized include Q-Learning, Genetic algorithm, SARSA, DQN, A3C.


Adversarial Machine Learning is the ML technique that aims to deceive machine learning models and cause a malfunction by supplying deceptive input. Attackers are aware that there are certain limitations when it comes to applying security solutions using ML models and thus these cyber-criminals are leveraging ML to their advantage in adversarial machine learning. Adversaries can exploit vulnerabilities to manipulate ML systems in order to alter their behavior to serve malicious goals. Attackers try to bypass that ML-solution’s ability to distinguish good from bad by learning the inner workings of a chosen ML-based solution. This type of attack involves two possible methods. First, bad actors study the tool’s learning processes to garner about the solution’s data domain, the types of models it utilizes, and how the data is governed and these cyber-attackers try to influence that learning process assuming the ML solution learns from a large pool of data. The second type of attack utilizes ML models as the initial point to morph attacks in order to evade detection without polluting data.


Adversarial attack threats are considered to be based on the Attacker’s Knowledge, Attack Goals, Attack Timing, Attack Frequency, and Attack Falsification.

  • Attacker’s KnowledgeWhite-box vs. Black-box attacks:

In white-box attacks, the attacker is aware of the parameters (algorithm) and the gradients of the model.

In contrast, black-box attacks assume that the adversary has limited or no knowledge about the gradients/parameters (algorithms) of the model.

  • Attacker’s Goal – Targeted vs. Reliability attacks:

In targeted attacks, the attacker has a specific goal and aims to induce definite production with regard to the model decision.

In contrast, a reliability attack occurs when the attacker only seeks to maximize the prediction error of the ML model without inducing any specific outcome.

  • Attack Timing – Evasion vs. Poisoning attacks:

In evasion attacks or exploratory attacks or attacks at decision time, the attacker aims to confuse the decision of the ML model after it has been learned.      


In contrast, poisoning attacks, or causative attack, involves adversarial corruption of the training data before training to induce a wrong prediction from the model.

  • Attack Frequency – One-shot vs. Iterative attacks:

Adversarial attacks are classified based on the frequency with which the adversarial samples are updated or optimized.

One-shot or one-time attacks are those attacks in which the adversarial examples are optimized only once. Iterative attacks involve updating the adversarial examples multiple times which perform better and are optimized, though, these attacks cost more computational time to generate.

  • Attack Falsification – False-positive vs. False-Negative attacks:

False-positive attacks cause the ML model to misclassify a negative sample as a positive one.

The False-negative attack causes a positive sample to be misclassified as a negative sample.

  • Iagodroid:

It is one of the earliest attacks against ML-based malware detection systems which utilizes the method to induce the mislabeling of malware families during the triaging process of malware samples.

  • Texture Perturbation Attacks:

The attack model for Adversarial Texture Malware Perturbation Attack (ATMPA) works by allowing the attacker to distort the malware image data(involves the conversion of malware binary code into image data)  during the visualization process.

  • EvnAttack:

This evasion attack model manipulates an optimal portion of the features of a malware executable file in a bi-directional way enabling the malware to evade detection from an ML model on the basis that the API calls contribute differently to the classification of benign and malign files.

  • AdvAttack:

This novel attack works by manipulating the API calls by injecting more of those features which are most relevant to benign files and removing those features with higher relevance scores to malware.

  • MalGAN:

This generative adversarial network (GAN) based algorithm, MalGAN proposed leverages on generative modeling techniques to evade black box malware detection systems with detection rates close to zero.

  • Slack Attacks:

The Slack FGM Attack alters the binaries of malware files, must maintain the semantic fidelity of the original file because altering the bytes of the malware arbitrarily could affect the malicious effect of the malware.  


IDSGAN, based on the Wasserstein GAN for generating adversarial attacks targeted towards intrusion detection systems, which uses a generator, black-box, and a discriminator. The discriminator imitates the black-box IDS and also allows the malicious traffic samples.

  • TCP Obfuscation Techniques:

This technique for evading ML-based intrusion detection systems proposes the modification of various properties of network connections to obfuscate a TCP communication.

  • Attacks on Statistical Spam filters:

Good word attacks on several spam filters ( SpamAssassin, SpamBayes, Bogofilter) based on Naive Bayes ML algorithm, were successfully evading the machine learning models from detecting spam or junk emails.

  • Attacks against crowd-turfing detection systems:

Malicious crowdsourcing systems are used to connect users who are willing to pay, with workers who carry out malicious activities such as the generation and distribution of fake news, or malicious campaigns. ML models used to detect crowdsourcing systems particularly in detecting the accounts of crowdsourcing workers.

  • Gradient masking:

The gradient masking method modifies an ML model to obscure its gradient from an attacker by saturating the sigmoid network resulting in a vanishing gradient effect.

  • Defensive Distillation:

The Distillation technique defends against adversarial crafting by using the output of the original neural network to train a smaller network rather than using the distillation as originally proposed by Hinton. Defensive distillation had been tested against adversarial attacks in computer vision.

  • Adversarial Training:

It is a three-step method to defend against adversarial attacks with the aim to improve the classification performance of the ML system.

1) Training the classifier on the original dataset

2) Generating adversarial samples

3) Iterating additional training epochs using the adversarial examples

  • Ensemble Defenses:

Ensemble methods, learning which combines one or more machine learning techniques, are used as a defense strategy against adversarial perturbations.


In this swiftly developing field of Machine Learning and digitalization, with associated growing risks, it is necessary to have a framework that is exclusively designed to recognize, counter, and mitigate cyber-attacks targeting the ML systems, such as the one developed by MITRE along with organizations such as Microsoft, IBM, Airbus, or Bosch – the Adversarial ML Threat Matrix. This matrix is arranged in a way akin to the conventional MITRE ATT&CK framework Model. There is an axis with seven tactics, focused in the area of ML: Reconnaissance, Initial Access, Execution, Persistence, Model Evasion, Exfiltration, and Impact.


Summary of Incident: In March 2016, Microsoft launched Tay, a chatbot for Twitter, created and structured for entertainment purposes by engaging people in dialogue through tweets or direct messages, while imitating the style and slang of a teenage girl. Within 24 hours of its deployment, Tay had to be shut off because it tweeted disgraceful words.

Mapping to Adversarial Threat Matrix :

Machine learning works by generating generalizations from pools of data. In any given data set, the algorithm will discover patterns and then “learn” how to approach those patterns in its own behavior. The Twitter chatbot utilized this algorithm on a dataset of anonymized public data along with some pre-written material of professional comedians, to let the bot discover patterns of language through its interactions, which it would mimic in subsequent conversations. Average users of Twitter coordinated together with the intent of defacing the Tay bot by exploiting this feedback loop. As a result, Tay’s training data was poisoned which led its conversation algorithms to generate offensive content.


There are defenses against adversarial attacks on machine learning applications but with major limitations. Firstly, most defenses are designed to protect against attacks on ML systems in computer vision. Secondly, the defenses intended for a specific attack or a part of the attack, and are not necessarily transferable. Given the challenges in this field of Machine Learning applications, organizations should have “belts and suspenders” in place as a pre-requisite, by configuring several mechanisms to look at the same data, which will give them a broad security perspective in order to circumvent attackers’ attempts to bypass these Machine Learning Systems.

She is a Biochemistry Major Graduate and is currently pursuing MSc Forensic Science
Mehak Khurana
Cyber Security Researcher Intern at CyberWarFare Labs

Tags: , , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *