Adversarial Attacks

Learn about adversarial attacks and how they occur.

Adversarial attacks are a type of model security concern where an attacker tries to create a problematic input that creates negative consequences. It is, in a way, reverse-engineering the model itself.

Adversarial attacks

Any kind of model can be attacked in this way. From image to tabular data, adversarial attacks represent a real concern for algorithm builders. Let’s consider a few examples.

Text-based data

Text is all the rage now, especially with generative AI and LLMs entering the fray. However, text is one of the easiest vehicles for adversarial attacks because of its complexity and an algorithm’s inherent necessity to allow for some “fuzziness” in the language itself. In a 2021 paperFursov, I., et al. “A Differentiable Language Model Adversarial Attack on Text Classifiers.” 2021. IEEE Access, vol. 10., researchers outlined the use of a fine-tuned LLM to actually generate adversarial examples on text classifiers. The power of LLMs to generate text seemingly identical (to humans) but very different (to algorithms) is used against other classifiers to attack them.

Autonomous vehicles and computer vision

Autonomous vehicles rely on complicated image processing algorithms (namely, neural networks) to recognize objects and navigate them safely. In a 2021 studyZhang, J., et al. “Evaluating Adversarial Attacks on Driving Safety in Vision-Based Autonomous Vehicles.” 2021. IEEE Internet of Things Journal, vol. 9, no. 5., researchers found that some perturbation attacks (where a known example of, say, a human, is very subtly altered in a way that the algorithm no longer detects a human) were effective.

In an autonomous driving setting, this could be an extremely dangerous attack—it could cause the car to no longer recognize humans if the circumstances are right. This danger is compounded because image algorithms are black boxes and are so complicated that there’s no surefire way to know exactly what it’s using to make its decisions. It could, for example, rely on a particular group of pixels in one area of an image to detect humans. If this group of pixels is marginally adjusted (not discernible to the human eye), the algorithm can’t detect humans.

Get hands-on with 1200+ tech skills courses.