Other Model Attacks

Learn about how models can be hacked or coerced into revealing private information.

Model security is essentially just cybersecurity for models. It has been demonstrated many times in the past that when attacked in the right way, models can reveal sensitive information about the data they were trained on. This can be a big risk for companies with data that must comply with legislation like HIPAA and GDPR.

The need for model security

In recent times, it has been demonstrated that LLMs (particularly ChatGPT) can occasionally surface an individual's data. Recall the famous Samsung case, in which Samsung employees used ChatGPT for something related to one of their proprietary products, leading to leaked private information elsewhere.

Even in traditional ML domains, models can be attacked (called privacy attacks) to force them to reveal private data. As we’ve seen in other lessons, they can also be attacked to cause nefarious consequences to happen (adversarial attacks). In an excellent 2021 paperRigaki, M., Garcia, S. “A Survey of Privacy Attacks in Machine Learning.” 2021., researchers outlined the various ways machine learning algorithms (not just centralized ML, but federated ML too) can be subject to privacy attacks.

The need for model security has never been greater. Adversarial attacks are just one special type of security concern.

Traditional model risks

Some of the same risks that cybersecurity experts know and love are genuine concerns for ML models as well. These include ways of shutting down a served model or even including trapdoors for later usage.

Backdoors

Backdoors are essentially planted patterns in a training set that seem innocuous until the model is in deployment, where specific input examples can trigger learned behavior in an algorithm and produce problematic outcomes.

Especially nowadays, when models are increasingly trained by third-party service providers, this becomes a particularly prevalent risk.

Denial-of-service (DoS)

Denial-of-service attacks are attempts by an attacker to overload the model’s computational resources and knock it offline. This can be done in a variety of ways. For example, one could simply overload the model with inference requests and force it to consume additional resources. Adversarial examples and other hard-to-predict inputs could also cause the model to work overtime and overload the server.

ML-specific risks

In addition to traditional cybersecurity concerns, models have risks associated with the premise that ML algorithms have parts of the training data stored in their parameters. Attackers attempting to force leakage of this information have several methods for doing so.

Model extraction

Model extraction is the reverse engineering of an ML model from its outputs. It’s a simple attack that’s quite easy to control. First, an attacker repeatedly queries a model and stores the inputs and outputs. Then, they train and fine-tune a new ML model on this data. They end up with a second model that performs very similarly to the first, but can now use it maliciously.

Get hands-on with 1200+ tech skills courses.