Virtualize Convolutaional Neural Network With Heatmap - CAM

#AI

#Engineering

Heatmap is often used in visualizations task for presenting the impertance, activation levels of different parts of images. It is most used for Convolutaional Neural Network visualization

This post only focus on the CAM - Class Activation Maps method

Class Activaion Maps - CAM

CAM is one of techniques for generating heatmaps to highlight class-specific regions of images. The highlighted incicating that the network is "looking" at the right place when making classification decision

Examples For NN Looking At Wrong Place

classify image as a 'train' but it only look at the 'train track' not the actual train.
classify an object by looking on other things that associate to the object but not looking at the objects.

Architecture

The example architecture above include some generic convolutional layers and result in 3 feature maps (in this example only).
Each feature map have height v and width u
The output of GAP go through a dense neural network to make decision

GAP Computation

GAP turn each feature map into single number, in example below it output 3 numbers.
In example above each GAP ouput is input to the NN to make decision.

HeatMap from CAM

The CAM operation take the sum of multiple of each GAP ouput with it corresponding feature map and output our Heatmap

Notes

Feature map - is the output of a convolutional layer that representing specific features in the input image or feature map.
Pooling layer - The layer sit behind the convolutional layer to downsize the feature map. It keep the most important parts and discarding the rest. Primarily, to reduce the overfitting rate that result in performance efficency in later layer due to the reduced size. The output of this layer is a smaller size of vector.
- Max Pooling - use a small window (2x2, 3x3) slide over the input and filter the maximum only.
- Average Pooling - Average value is taken.
GAP - Global Average Pooling - Compute the average value of the whole featuremap and output only one number that represent the average activaion / presence of a specific pattern learned by the convolutional layers across entire spatial extent of the input.
Softmax function - operator to transform previous layer's ouput into vector of probabilities. Usually used in multiclass classificaition.