Machine Learning how to Industrial AI & Operational Intelligence Breaking a Model: A Python Script to Generate Adversarial Noise (FGSM)

Breaking a Model: A Python Script to Generate Adversarial Noise (FGSM)

Adversarial examples are inputs to machine learning models that cause them to make incorrect predictions despite being nearly indistinguishable from valid data to humans. A small, carefully crafted perturbation can cause a model to confidently misclassify an image.

The Problem: Even state-of-the-art models like ResNet and Vision Transformers can be “fooled” with tiny changes invisible to the human eye.

Real-World Impact

  • Autonomous vehicles misreading stop signs
  • Face recognition systems failing authentication
  • Malware detection bypassing security filters
  • Fraud detection systems missing malicious transactions

2. The FGSM Attack

Fast Gradient Sign Method (FGSM) is one of the simplest yet most effective white-box adversarial attacks. It requires only one forward and backward pass through the model.

Key Characteristics

  • White-box: Requires model access & gradients
  • Single-step: Very fast computation
  • High success rate: ~90% on ImageNet
  • L₂ norm bounded: Controls perturbation size

Attack Flow

  1. Compute loss gradient w.r.t. input
  2. Take sign of gradient (direction of steepest ascent)
  3. Scale by small epsilon and add to input
  4. Clip to valid range [0,1]
⚠️ Important: This is for educational and security research purposes only. Do not use to attack production systems without authorization.

3. The Mathematics Behind FGSM

Given model f(x) and target class y, FGSM finds perturbation δ that maximizes loss:

x_adv = x + ε · sign(∇_x J(θ, x, y))

Where:

  • x = original input
  • x_adv = adversarial example
  • ε = perturbation magnitude (typically 0.01-0.3)
  • sign(·) = element-wise sign function
  • ∇_x J(θ, x, y) = gradient of loss w.r.t. input
See also  Simulating Drones: A Boids Flocking Algorithm in Python
ε Value Attack Strength Visual Change Success Rate
0.01 Weak Invisible ~20%
0.1 Medium Subtle ~85%
0.3 Strong Visible noise ~98%

4. Environment Setup

pip install torch torchvision matplotlib numpy pillow

This script uses PyTorch and torchvision for:

  • Pretrained ResNet-50 model
  • CIFAR-10 dataset
  • Gradient computation
  • Image visualization

5. Complete Python Implementation

Here’s a self-contained script that loads CIFAR-10, attacks a pretrained ResNet-50, and visualizes the results:

#!/usr/bin/env python3
"""
FGSM Adversarial Attack Demo
Breaks pretrained ResNet-50 on CIFAR-10 images
"""

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

# Configuration
EPSILON = 0.1  # Attack strength
NUM_IMAGES = 8  # Images to attack

# CIFAR-10 class names
CIFAR_CLASSES = [
    'airplane', 'automobile', 'bird', 'cat', 'deer',
    'dog', 'frog', 'horse', 'ship', 'truck'
]

def fgsm_attack(model, images, labels, epsilon=0.1):
    """
    Perform FGSM attack on batch of images
    
    Args:
        model: PyTorch model
        images: batch of images [B, C, H, W]
        labels: true labels [B]
        epsilon: perturbation magnitude
    
    Returns:
        adversarial images
    """
    images.requires_grad = True
    outputs = model(images)
    loss = nn.CrossEntropyLoss()(outputs, labels)
    
    # Compute gradient w.r.t. input
    model.zero_grad()
    loss.backward()
    
    # Create perturbation from gradient sign
    perturbation = epsilon * images.grad.sign()
    
    # Apply perturbation
    adv_images = images + perturbation
    
    # Clip to valid range [0,1]
    adv_images = torch.clamp(adv_images, 0, 1)
    
    return adv_images.detach()

def main():
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")
    
    # Load pretrained ResNet-50 (CIFAR-10 version)
    model = models.resnet18(num_classes=10)
    model.load_state_dict(torch.load('cifar10_resnet18_pretrained.pth', 
                                   map_location=device))
    model.to(device)
    model.eval()
    
    # CIFAR-10 transforms
    transform = transforms.Compose([
        transforms.Resize((32, 32)),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), 
                           (0.2023, 0.1994, 0.2010))
    ])
    
    # Load test data
    test_dataset = CIFAR10(root='./data', train=False, 
                          download=True, transform=transform)
    test_loader = DataLoader(test_dataset, batch_size=NUM_IMAGES, shuffle=True)
    
    # Get one batch
    images, labels = next(iter(test_loader))
    images, labels = images.to(device), labels.to(device)
    
    print(f"Original predictions:")
    with torch.no_grad():
        orig_outputs = model(images)
        _, orig_preds = orig_outputs.max(1)
        for i in range(NUM_IMAGES):
            print(f"  Image {i}: {CIFAR_CLASSES[labels[i]]} → {CIFAR_CLASSES[orig_preds[i]]}")
    
    # Generate adversarial examples
    adv_images = fgsm_attack(model, images, labels, epsilon=EPSILON)
    
    print(f"\nAdversarial predictions (ε={EPSILON}):")
    with torch.no_grad():
        adv_outputs = model(adv_images)
        _, adv_preds = adv_outputs.max(1)
        for i in range(NUM_IMAGES):
            success = "✅" if adv_preds[i] != orig_preds[i] else "❌"
            print(f"  Image {i}: {CIFAR_CLASSES[labels[i]]} → {CIFAR_CLASSES[adv_preds[i]]} {success}")
    
    # Visualization
    fig, axes = plt.subplots(3, NUM_IMAGES, figsize=(4*NUM_IMAGES, 12))
    
    for i in range(NUM_IMAGES):
        # Original
        img_orig = images[i].cpu().permute(1, 2, 0).numpy()
        img_orig = (img_orig * np.array([0.2023, 0.1994, 0.2010])) + np.array([0.4914, 0.4822, 0.4465])
        img_orig = np.clip(img_orig, 0, 1)
        
        # Adversarial
        img_adv = adv_images[i].cpu().permute(1, 2, 0).numpy()
        img_adv = (img_adv * np.array([0.2023, 0.1994, 0.2010])) + np.array([0.4914, 0.4822, 0.4465])
        img_adv = np.clip(img_adv, 0, 1)
        
        # Difference
        diff = np.abs(img_adv - img_orig)
        
        axes[0, i].imshow(img_orig)
        axes[0, i].set_title(f"Original\n{labels[i].item()}: {CIFAR_CLASSES[labels[i]]}", fontsize=10)
        axes[0, i].axis('off')
        
        axes[1, i].imshow(img_adv)
        axes[1, i].set_title(f"Adversarial (ε={EPSILON})\n{adv_preds[i].item()}: {CIFAR_CLASSES[adv_preds[i]]}", fontsize=10)
        axes[1, i].axis('off')
        
        axes[2, i].imshow(diff)
        axes[2, i].set_title("Perturbation\nMagnitude", fontsize=10)
        axes[2, i].axis('off')
    
    plt.tight_layout()
    plt.savefig('fgsm_attack_demo.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\n✓ Demo complete! Check 'fgsm_attack_demo.png'")

if __name__ == "__main__":
    main()

Expected Output:

  • ~85-95% attack success rate on CIFAR-10
  • Original predictions vs adversarial predictions shown
  • Side-by-side visualization of original/adversarial/difference
  • Saved plot as fgsm_attack_demo.png

6. Running the Attack

Save the code above as fgsm_attack.py and run:

python fgsm_attack.py

You should see output like:

Original predictions:
  Image 0: cat → cat
  Image 1: dog → dog
  Image 2: truck → truck

Adversarial predictions (ε=0.1):
  Image 0: cat → airplane ✅
  Image 1: dog → frog ✅
  Image 2: truck → ship ✅

7. Limitations & Defenses

FGSM Limitations

  • Single-step → suboptimal perturbations
  • Sensitive to ε choice
  • Doesn’t optimize for minimal perturbation
See also  Spiking Neural Networks: A Snntorch Simulation Example

Stronger Attacks

  • PGD: Projected Gradient Descent (iterative FGSM)
  • CW: Carlini-Wagner (optimization-based)
  • DeepFool: Minimal perturbation norm

Defenses

  • Adversarial Training: Train on adversarial examples
  • Input Preprocessing: Randomization, quantization
  • Detection: Gradient masking, statistical tests
  • Certified Defenses: Randomized smoothing

8. Ethical Considerations

Responsible Use

  • Research Only: Use for security research and model robustness testing
  • Authorization Required: Never attack production systems without permission
  • Report Vulnerabilities: Disclose findings responsibly
  • Defensive Research: Focus on building defenses as much as attacks

This knowledge helps us build more robust, secure AI systems. Understanding attacks is the first step toward effective defenses.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post