Generate adversarial attacks by adding noise to the image such that the model missclassifies it as the desired target class, without making the noise perceptible to a casual human viewer.
My idea was to be generic over the method to perform the targeted attacks (but only torchvision models).
Key features:
- 🧰 being generic such that it can be expanded in the future by implementing new models and new attacks (they are independent):
- 🤖 models can be implemented by subclassing
adversarial.model.Model(here I usedResNet50) - 🥷 attacks can be defined by subclassing
adversarial.attack.AdvAttack(here I have implemented projected gradient descent with L2 and Linf norms)
- 🤖 models can be implemented by subclassing
- 📝 informative logging
- 🧪 accurate unit testing and integration tests
- 📚 docs
This code is generic also over the strategy to perform adverserial attack. However, as I've only implemented projected gradient descent, I will briefly discuss this method here.
The idea is to find the adversarial noise by taking many small steps (learning rate
This is achieved in three main steps:
- Take a step towards the minimum of the loss
$\mathcal{L}$ :
- Project
$\delta$ to obtain a small noise$\left\lVert \delta \right\rVert_p \leq \epsilon$ :-
$p=\infty$ ,ProjGradLInfclamps each pixel within the range$[-\epsilon, +\epsilon]$ -
$p=2$ ,ProjGradL2rescales as$\delta = \epsilon \delta / \left\lVert \delta \right\rVert_2$ .
-
- Keep valid pixel values (within 0 and 1 here).
We repeat this procedure until either the model switches from the original prediction to the target class or a maximal number of steps have been taken (set to default in adversarial_attack() to 100).
There are two main ways to use/test this, one is installing the code as a library and the other is to download the source code and run the integration tests.
The main function to use is adversarial.attack.adversarial_attack.
Install the library adversarial into a project or an environment with python 3.13 and run a main.py.
For example:
- create a new app with
uv:uv init --app test-adv --python 3.13 cdintotest-adv:cd test-adv- install this library into the project
uv add git+https://github.com/fraterenz/adversarial --tag v1.0.2 - download the pretrained model manually:
curl -O -L "https://www.dropbox.com/scl/fi/3cfjlzp4ls8n5imtfe51d/resnet50-11ad3fa6.pth?rlkey=zxaaj95mzlsd4tv7vjos0kwc5&st=om7rfwgo&dl=0" - use the library: create a
main.pyand copy-paste the code below replacing/path/to/imagewith the folder where the image is stored. - run main
uv run main.py log_cli=true --log-cli-level=INFO
An example of a main.py performing the attack that can be run with uv run main.py log_cli=true --log-cli-level=INFO:
from pathlib import Path
from adversarial import Category, model
from adversarial.attack import (
ProjGradLInf,
adversarial_attack,
)
from adversarial.utils import load_image
import logging
logging.basicConfig(level=logging.INFO)
def main():
logging.info("Running adversarial attack!")
path2img = Path("/path/to/image")
image = load_image(path2img / "panda.jpeg")
# projected gradient with norm 2
adv_attack = ProjGradLInf(lr=0.05, epsilon=0.01)
result = adversarial_attack(
image,
Category("tabby"),
model.ResNet50(),
adv_attack,
)
result.plot(path2img / "panda_attack_into_tabby.png")
logging.info("End adversarial attack!")
if __name__ == "__main__":
main()Download source code, the pretrained model weights and run tests with pytest
git clone git@github.com:fraterenz/adversarial.git- download the pretrained model manually:
curl -O -L "https://www.dropbox.com/scl/fi/3cfjlzp4ls8n5imtfe51d/resnet50-11ad3fa6.pth?rlkey=zxaaj95mzlsd4tv7vjos0kwc5&st=om7rfwgo&dl=0" uv run pytest
Cute pictures of giant pandas will be generated in the folder /tests/integration/fixtures/.
Have a look in particular at the test test_adversarial_attack in test/integration/test_integration_attack.py and the output it generates:
panda_attack_l2_into_gibbon.pngpanda_attack_l2_into_tabby.pngpanda_attack_lInf_into_gibbon.pngpanda_attack_lInf_into_tabby.png
Main limitations:
- works for CPU only
- work for one image at the time.
- need to trail-and-error manually the learning rate
lrand the norm noiseepsilon. Ideally, find a way to automatically tune that. - use a nicer optimiser that is already provided (ADAM?), not manual implementation of SGD.
- being completely generic over the model, i.e. not only torch vision models, should be easy to implement. We would need to refactor
adversarial_attackto take also an update strategy. This update stragegy would do its things, such as computing the backward pass for torch.