Machine Learning how to Concepts What Does a CNN See? Visualizing Feature Maps with PyTorch

What Does a CNN See? Visualizing Feature Maps with PyTorch

A convolutional neural network does not look at an image the way we do. Each convolutional layer transforms pixels into activation maps that emphasize edges, textures, parts, and eventually more semantic structures, which is why feature map visualization is useful for intuition, debugging, and model inspection.

Why feature maps matter

Feature maps are the output activations produced by convolutional filters, and they help show which patterns the network is emphasizing at each depth of the model. Visualizing them makes it easier to understand what the model detects, compare different layers, and debug architectures that seem to behave like black boxes.

Early layers

Usually respond to edges, contrast boundaries, and simple orientation patterns because these are the first reusable structures available in raw images.

Middle layers

Often emphasize repeated textures, curves, corners, and combinations of simpler motifs as receptive fields grow.

Deeper layers

Tend to encode more abstract evidence that supports object-level recognition rather than directly resembling the original image.

Practical use: Feature maps are not just for curiosity. They can reveal dead filters, unexpected sensitivity to background texture, or preprocessing mistakes when activations look washed out or meaningless.

What a feature map is

In a CNN, each convolutional filter scans the input and produces one activation channel, so a layer output typically has the form [batch, channels, height, width]. If a feature is strongly present in a region, the corresponding channel will show higher activation there, which is why heatmap-style views are informative.

See also  Backtesting a Strategy: A Backtrader Script for Crypto Trend Following
Layer stage Typical output idea What you often see
Conv 1 Many low-level channels Edges, bright-dark boundaries, simple directionality.
Mid-level convs Compressed but richer patterns Textures, repeated motifs, local shapes.
Deep convs More abstract representations Class-relevant evidence rather than human-readable mini-images.

Extracting activations with hooks

The standard PyTorch approach is to register a forward hook on the layer you want to inspect, run a forward pass, and save the output tensor for later plotting. This pattern is widely used because it lets you inspect internal tensors without rewriting the model’s forward method.

activations = {}

def get_activation(name):
    def hook(module, inputs, output):
        activations[name] = output.detach()
    return hook

model.features[0].register_forward_hook(get_activation("conv1"))
model.features[5].register_forward_hook(get_activation("conv2"))
Why hooks help: PyTorch forum examples and tutorials commonly use hooks to capture outputs from layers such as conv1 before plotting each channel with matplotlib.

Preprocessing and model setup

A typical workflow uses a pretrained CNN such as VGG16, resizes an image to 224×224, converts it to a tensor, adds a batch dimension, and then moves both the model and the tensor to CPU or GPU as available. The GeeksforGeeks walkthrough also traverses VGG16’s convolutional layers and stores their outputs sequentially, which is a simple way to inspect the entire stack.

import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image

model = models.vgg16(weights=models.VGG16_Weights.DEFAULT).eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

image = Image.open("cat.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0).to(device)
Small but important: If normalization or image size is wrong, the feature maps may still exist, but what they mean becomes much harder to trust because the model is seeing data outside the regime it was trained on.

Complete PyTorch script

This example uses forward hooks, a pretrained VGG16 model, matplotlib, and channel grid plots, which closely matches the common feature-map visualization workflow shown in recent tutorials.

import math
import torch
import matplotlib.pyplot as plt
from PIL import Image
from torchvision import models, transforms

# 1) Model
weights = models.VGG16_Weights.DEFAULT
model = models.vgg16(weights=weights).eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# 2) Preprocess image
preprocess = weights.transforms()
image = Image.open("sample.jpg").convert("RGB")
input_tensor = preprocess(image).unsqueeze(0).to(device)

# 3) Capture activations
activations = {}
selected_layers = {
    "features.0": "conv1_1",
    "features.5": "conv2_1",
    "features.10": "conv3_1",
    "features.17": "conv4_1",
}

hooks = []
for name, module in model.named_modules():
    if name in selected_layers:
        hooks.append(module.register_forward_hook(
            lambda module, inputs, output, n=selected_layers[name]: activations.setdefault(n, output.detach().cpu())
        ))

# 4) Forward pass
with torch.no_grad():
    _ = model(input_tensor)

# 5) Plot helper

def plot_feature_grid(feature_tensor, layer_name, max_maps=16):
    fmap = feature_tensor[0]              # remove batch dim
    n = min(max_maps, fmap.shape[0])
    cols = 4
    rows = math.ceil(n / cols)

    fig, axes = plt.subplots(rows, cols, figsize=(10, 2.6 * rows))
    axes = axes.flatten()

    for i in range(n):
        channel = fmap[i]
        channel = (channel - channel.min()) / (channel.max() - channel.min() + 1e-8)
        axes[i].imshow(channel, cmap="viridis")
        axes[i].set_title(f"{layer_name} · ch {i}", fontsize=9)
        axes[i].axis("off")

    for j in range(n, len(axes)):
        axes[j].axis("off")

    fig.suptitle(f"Feature maps from {layer_name}", fontsize=14)
    fig.tight_layout()
    plt.show()

# 6) Visualize
for layer_name, tensor in activations.items():
    print(layer_name, tensor.shape)
    plot_feature_grid(tensor, layer_name, max_maps=12)

# 7) Remove hooks
for h in hooks:
    h.remove()
Alternative summary view: Some tutorials average or sum across channels to create one representative map per layer, which is useful when you want a compact overview of the whole network rather than dozens of separate channels.

How to interpret the maps

High activation does not mean “the model sees the object” in the human sense. It means a filter found a pattern it was tuned to respond to in a specific spatial region. That is why one channel may light up on fur texture, another on a sharp silhouette, and another on a repeated stripe pattern even when none of them resembles the original image directly.

  • Bright localized responses often indicate strong evidence for that filter’s preferred pattern in a region.
  • Very noisy maps across all layers can signal a preprocessing mismatch or poor image quality.
  • Nearly blank channels are not always bad; some filters simply do not fire for a given image.
  • Comparing shallow and deep layers is more informative than inspecting one layer in isolation because hierarchical representation is the point of CNNs.
Good reading habit: Start with the first convolutional block, then jump to a middle block, then to a deep block. You are looking for a progression from geometry to texture to semantic evidence, not just pretty pictures.

Common pitfalls

Showing too many channels

Modern layers can contain hundreds of channels, so plotting everything at once quickly becomes unreadable. Most examples limit the view to a subset or collapse channels into a mean map for each layer.

See also  Backtesting a Strategy: A Backtrader Script for Crypto Trend Following

Forgetting normalization for display

Raw activations can have a wide numeric range, so per-channel min-max normalization is often needed before plotting, otherwise the visualization may appear nearly black or washed out.

Confusing feature maps with saliency

Feature maps show internal activations, while techniques like Grad-CAM or saliency maps are designed to explain which image regions contributed to a decision. They answer related but different questions.

Bottom line: Feature map visualization is best treated as an interpretability aid, not as a complete explanation of model reasoning.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post