Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models

We introduce SALF-CBM, the first Spatially-Aware and Label-Free Concept Bottleneck Model.

Key Features

Transforming Any Vision Model into an Explainable One: SALF-CBM transforms any “black-box” backbone into a visually interpretable model by explaining its predictions using visual concepts and grounding them in the input image. SALF-CBM is model agnostic, supporting both CNN and ViT backbones.

Explaining Model Predictions: Given a pre-trained backbone, SALF-CBM decomposes its learned features into interpretable concept maps, each highlighting a distinct visual concept in the image. By training a sparse classifier on top of these maps, we create a spatially-aware bottleneck model whose predictions are both visually and conceptually interpretable, without compromising performance of the original model.

Exploring Model Perception: Given a region-of-interest within the image, SALF-CBM can pinpoint the most relevant concepts recognized by the model in this region. This capability enables users to examine their model’s internal reasoning processes, better understand its perception of different image components, and diagnose classification errors effectively.

Facilitating Local User Intervention: During inference, users can directly modify the model’s concept maps, allowing targeted adjustments to the final predictions. This interactive capability facilitates counterfactual explanations and focused corrections within specific image regions.

Abstract

Modern deep neural networks have now reached human-level performance across a variety of tasks. However, unlike humans they lack the ability to explain their decisions by showing where and telling what concepts guided them. In this work, we present a unified framework for transforming any vision neural network into a spatially and conceptually interpretable model. We introduce a spatially-aware concept bottleneck layer that projects “black-box” features of pre-trained backbone models into interpretable concept maps, without requiring human labels. By training a classification layer over this bottleneck, we obtain a self-explaining model that articulates which concepts most influenced its prediction, along with heatmaps that ground them in the input image. Accordingly, we name this method “Spatially-Aware and Label-Free Concept Bottleneck Model” (SALF-CBM). Our results show that the proposed SALF-CBM: (1) Outperforms non-spatial CBM methods, as well as the original backbone, on a variety of classification tasks; (2) Produces high-quality spatial explanations, outperforming widely used heatmap-based methods on a zero-shot segmentation task; (3) Facilitates model exploration and debugging, enabling users to query specific image regions and refine the model's decisions by locally editing its concept maps.

How does it work?

Given a pre-trained backbone model, we transform it into an explainable SALF-CBM as follows:

We automatically generate a list of task-relevant visual concepts using GPT.
Prior to training, we compute a spatial concept similarity matrix P, which quantifies the presence of visual concepts at different locations within each training image. This is achieved by leveraging CLIP's visual prompting property, wherein the image encoder is guided to focus on specific image regions by highlighting them with visual markers, such as a red circle.
We train a spatially-aware Concept Bottleneck Layer (CBL) to project the backbone's “black-box” features into interpretable concept maps. This is done using a loss function L_CBL, which encourages the predicted concept maps to align with the spatial image-concept similarity matrix P.
We pool the concept maps into global concept activations, and train a sparse linear classifier over them such that the model’s final prediction is formulated as a linear combination of interpretable concepts.
At inference time, explainability is provided as an integral part of model's forward-pass, without requiring additional computations. Individual model decisions are globally explained by the concepts that most contributed to the model's output, as well as locally via their corresponding concept maps.

Explaining model predictions

SALF-CBM explains its predictions as an integral part of its inference process by specifying which concepts contributed to its output the most, and grounding them in the input image. Below, we show qualitative examples with ResNet-50 backbone pre-trained on ImageNet.

Exploring model perception

When prompted with a specific image region, SALF-CBM reveals how the model perceives it by identifying the most strongly activated concepts within that region. Below, we show qualitative examples with ResNet-50 backbone pre-trained on ImageNet.

BibTeX

If you find this project useful for your research, please cite the following:

@article{benou2025show, title={Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models}, author={Benou, Itay and Riklin-Raviv, Tammy}, journal={arXiv preprint arXiv:2502.20134}, year={2025} }