A convolutional neural network does not look at an image the way we do. Each convolutional layer transforms pixels into activation maps that emphasize edges, textures, parts, and eventually more semantic structures, which is why feature map visualization is useful for intuition, debugging, and model inspection.
Why feature maps matter
Feature maps are the output activations produced by convolutional filters, and they help show which patterns the network is emphasizing at each depth of the model. Visualizing them makes it easier to understand what the model detects, compare different layers, and debug architectures that seem to behave like black boxes.
Early layers
Usually respond to edges, contrast boundaries, and simple orientation patterns because these are the first reusable structures available in raw images.
Middle layers
Often emphasize repeated textures, curves, corners, and combinations of simpler motifs as receptive fields grow.
Deeper layers
Tend to encode more abstract evidence that supports object-level recognition rather than directly resembling the original image.
What a feature map is
In a CNN, each convolutional filter scans the input and produces one activation channel, so a layer output typically has the form [batch, channels, height, width]. If a feature is strongly present in a region, the corresponding channel will show higher activation there, which is why heatmap-style views are informative.
| Layer stage | Typical output idea | What you often see |
|---|---|---|
| Conv 1 | Many low-level channels | Edges, bright-dark boundaries, simple directionality. |
| Mid-level convs | Compressed but richer patterns | Textures, repeated motifs, local shapes. |
| Deep convs | More abstract representations | Class-relevant evidence rather than human-readable mini-images. |
A feature map grid usually looks like a collection of abstract heatmaps rather than a clean reconstruction of the original photograph, especially in deeper layers.
Extracting activations with hooks
The standard PyTorch approach is to register a forward hook on the layer you want to inspect, run a forward pass, and save the output tensor for later plotting. This pattern is widely used because it lets you inspect internal tensors without rewriting the model’s forward method.
activations = {}
def get_activation(name):
def hook(module, inputs, output):
activations[name] = output.detach()
return hook
model.features[0].register_forward_hook(get_activation("conv1"))
model.features[5].register_forward_hook(get_activation("conv2"))
conv1 before plotting each channel with matplotlib.Preprocessing and model setup
A typical workflow uses a pretrained CNN such as VGG16, resizes an image to 224×224, converts it to a tensor, adds a batch dimension, and then moves both the model and the tensor to CPU or GPU as available. The GeeksforGeeks walkthrough also traverses VGG16’s convolutional layers and stores their outputs sequentially, which is a simple way to inspect the entire stack.
import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image
model = models.vgg16(weights=models.VGG16_Weights.DEFAULT).eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image = Image.open("cat.jpg").convert("RGB")
input_tensor = transform(image).unsqueeze(0).to(device)
Complete PyTorch script
This example uses forward hooks, a pretrained VGG16 model, matplotlib, and channel grid plots, which closely matches the common feature-map visualization workflow shown in recent tutorials.
import math
import torch
import matplotlib.pyplot as plt
from PIL import Image
from torchvision import models, transforms
# 1) Model
weights = models.VGG16_Weights.DEFAULT
model = models.vgg16(weights=weights).eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# 2) Preprocess image
preprocess = weights.transforms()
image = Image.open("sample.jpg").convert("RGB")
input_tensor = preprocess(image).unsqueeze(0).to(device)
# 3) Capture activations
activations = {}
selected_layers = {
"features.0": "conv1_1",
"features.5": "conv2_1",
"features.10": "conv3_1",
"features.17": "conv4_1",
}
hooks = []
for name, module in model.named_modules():
if name in selected_layers:
hooks.append(module.register_forward_hook(
lambda module, inputs, output, n=selected_layers[name]: activations.setdefault(n, output.detach().cpu())
))
# 4) Forward pass
with torch.no_grad():
_ = model(input_tensor)
# 5) Plot helper
def plot_feature_grid(feature_tensor, layer_name, max_maps=16):
fmap = feature_tensor[0] # remove batch dim
n = min(max_maps, fmap.shape[0])
cols = 4
rows = math.ceil(n / cols)
fig, axes = plt.subplots(rows, cols, figsize=(10, 2.6 * rows))
axes = axes.flatten()
for i in range(n):
channel = fmap[i]
channel = (channel - channel.min()) / (channel.max() - channel.min() + 1e-8)
axes[i].imshow(channel, cmap="viridis")
axes[i].set_title(f"{layer_name} · ch {i}", fontsize=9)
axes[i].axis("off")
for j in range(n, len(axes)):
axes[j].axis("off")
fig.suptitle(f"Feature maps from {layer_name}", fontsize=14)
fig.tight_layout()
plt.show()
# 6) Visualize
for layer_name, tensor in activations.items():
print(layer_name, tensor.shape)
plot_feature_grid(tensor, layer_name, max_maps=12)
# 7) Remove hooks
for h in hooks:
h.remove()
How to interpret the maps
High activation does not mean “the model sees the object” in the human sense. It means a filter found a pattern it was tuned to respond to in a specific spatial region. That is why one channel may light up on fur texture, another on a sharp silhouette, and another on a repeated stripe pattern even when none of them resembles the original image directly.
- Bright localized responses often indicate strong evidence for that filter’s preferred pattern in a region.
- Very noisy maps across all layers can signal a preprocessing mismatch or poor image quality.
- Nearly blank channels are not always bad; some filters simply do not fire for a given image.
- Comparing shallow and deep layers is more informative than inspecting one layer in isolation because hierarchical representation is the point of CNNs.
Common pitfalls
Showing too many channels
Modern layers can contain hundreds of channels, so plotting everything at once quickly becomes unreadable. Most examples limit the view to a subset or collapse channels into a mean map for each layer.
Forgetting normalization for display
Raw activations can have a wide numeric range, so per-channel min-max normalization is often needed before plotting, otherwise the visualization may appear nearly black or washed out.
Confusing feature maps with saliency
Feature maps show internal activations, while techniques like Grad-CAM or saliency maps are designed to explain which image regions contributed to a decision. They answer related but different questions.
