Note on Feature Visualization
Motivation
- We want to understand what the hidden units “represent”
- What are they tuned to? What’s the favorite stimuli?
- Why should we find the most excitable stimuli?
Resources
-
DeepDream.ipynb Tensorflow
- Another DeepDream.ipynb Tensorflow.
- Post of fast Feature Visualization PyTorch.
- Explicit blurring to counteract high frequency. Optimize and resize to make multi-scale feature visualization. Very similar to DeepDream
- Another Synthesis post
- Zoo of feature visualization notebook PyTorch
- Distill pub’s systemic treatment of FV
- Effect of Regularization on Feature Visualization
- From their treatment, it seems the
lucid
package attached to tensorflowlucid.optvis.render
can generate the most useful and good looking visualizations. The major techniques utilized are Spatial Decorrelation and Feature Decorrelation. And use some transform to improve the robustness of visualization - Should consider having it in pytorch in the future.
- Integrating FV modules into toolkit
Overview of Algorithms
The basic idea of feature visualization is to optimize for the images that highly activate a hidden unit (or a weighted sum of them), so that its activity could be understood as coding for that feature.
The big enemy of FV is the high frequency artifact, which will arise if you directly back-propagate activity back to pixel space! Kind of visual illusion for CNN, and note these kinds of artifacts can be used as attacks to manipulate the classification result of CNN. And below we will talk about the ways to defeat these artifacts and make pretty looking images.
Remark : Activation maximization using back-prop has a really close relationship with Generative model and GAN.
- Functionally, they are all taking in an activation pattern (hidden vector) and output an image
- Mechanistically, the
upconv
operation in GAN is justTransposeCOnv
, which is exactly the same operation happened when the gradient vector propagate backward from deeper layers to shallower layers.- Thus they are all using activation in the higher layers to generate filter patterns repetitively!
Common Algorithms Zoo https://github.com/utkuozbulak/pytorch-cnn-visualizations
Regularization
There are 3 major groups of regularization.
- Directly eliminate high frequency components
- By adding Variation energy during optimization
- Or by blurring the image!
- Reparamtrize the image and add weights to different directions in that parameter space.
- Use FFT parametrization,
- Or use GAN parametrization etc.
- Robustness to perturbation
- Add jitter and noise and see if the activation is stable across these perturbations.
Optimization
Aside from regularization, another major component of feature visualization algorithms is the optimizer, which optmizer they use and the initialization they use may affect the success of optimization and speed of convergence. Here are some classes of algorithms
- 0-order methods
- Genetic algorithm
- CMA-ES
- 1-order methods
- SGD
- Momentum
- Adam, AdaGrad, AdaDelta, RMS-prop etc.
- Pseudo 2-order methods
- LFBGS
Example: Deep Dream
It’s a Gradient Ascent but the point is to generate good looking images, thus some regularization techniques are used! Like L1, L2!
Matlab has a nested bunch of functions, the major ones are these.
nnet.internal.cnn.visualize.VisualNetwork.createVisualNetworkForChannelAverage(iNet, layerIdx, channels)
nnet.internal.cnn.visualize.deepDreamImageLaplacianNorm
nnet.internal.cnn.visualize.TiledGradients.computeTiledGradient
- Compute gradient for tiles to save memory!
nnet.internal.cnn.visualize.LaplacianPyramid.laplacianNormalizedImage
- Normalize the gradient as image in different laplacian levels! Prefer lower frequency stuff!
function gradient = iNormalizeGradient(X,gradient)
gradient = gradient ./ shiftdim(std(reshape(gradient, [], size(X,4))) + 1e-9, -2);
end
An image can be compressed into a Laplacian pyramid, and then the pyramid can be merged to give back the original image. In Laplacian pyramid normalization, we normalize each image in the pyramid before merging.
Core code is just like this
for iter=1:numIterations
[gradient, activations] = nnet.internal.cnn.visualize.TiledGradients.computeTiledGradient(...
iVisualNet, X, tileSize);
if useLaplacian
gradient = iLaplacianNormalizedImage( gradient );
else
gradient = iNormalizeGradient(X,gradient);
end
% Update step.
X = X + gradient * stepSize;
% Display progress.
summary.update(octave, iter, activations);
reporter.reportIteration( summary );
end
Example: Lucent Framework
Lucent is a recent pytorch implementation of Lucid visualization framework. It’s core code is super clear and easy to read and modify, so lets take a look
def render_vis(model, objective_f, param_f=None, optimizer=None, transforms=None,
thresholds=(512,), verbose=False, preprocess=True, progress=True,
show_image=True, save_image=False, image_name=None, show_inline=False):
if param_f is None:
param_f = lambda: param.image(128)
# param_f is a function that should return two things
# params - parameters to update, which we pass to the optimizer
# image_f - a function that returns an image as a tensor
params, image_f = param_f()
if optimizer is None:
optimizer = lambda params: torch.optim.Adam(params, lr=5e-2)
optimizer = optimizer(params)
if transforms is None:
transforms = transform.standard_transforms.copy()
if preprocess:
if model._get_name() == "InceptionV1":
# Original Tensorflow InceptionV1 takes input range [-117, 138]
transforms.append(transform.preprocess_inceptionv1())
else:
# Assume we use normalization for torchvision.models
# See https://pytorch.org/docs/stable/torchvision/models.html
transforms.append(transform.normalize())
# Upsample images smaller than 224
image_shape = image_f().shape
if image_shape[2] < 224 or image_shape[3] < 224:
transforms.append(torch.nn.Upsample(size=224, mode='bilinear', align_corners=True))
transform_f = transform.compose(transforms)
hook = hook_model(model, image_f)
objective_f = objectives.as_objective(objective_f)
if verbose:
model(transform_f(image_f()))
print("Initial loss: {:.3f}".format(objective_f(hook)))
images = []
try:
for i in tqdm(range(1, max(thresholds) + 1), disable=(not progress)):
optimizer.zero_grad()
model(transform_f(image_f()))
loss = objective_f(hook)
loss.backward()
optimizer.step()
if i in thresholds:
image = tensor_to_img_array(image_f())
images.append(image)
images.append(tensor_to_img_array(image_f()))
if save_image:
export(image_f(), image_name)
if show_inline:
show(tensor_to_img_array(image_f()))
elif show_image:
view(image_f())
return images