Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models

Weiduo Liao, Fei Han, Hisao Ishibuchi, Qingfu Zhang, Ying Wei

Vision & 3D Sat, Apr 25 · 11:42 AM–11:52 AM · 204 A/B Avg rating: 5.33 (4–6)

Author-provided TL;DR

We introduce CompSLOT, a universal concept learning method to continual learning with foundation models system to establish a concept-level understanding of class prediction for alternative continual learners.

Abstract

Vision learners often struggle with catastrophic forgetting due to their reliance on class recognition by comparison, rather than understanding classes as compositions of representative concepts. This limitation is prevalent even in state-of-the-art continual learners with foundation models and worsens when current tasks contain few classes. Inspired by the recent success of concept-level understanding in mitigating forgetting, we design a universal framework CompSLOT to guide concept learning across diverse continual learners. Leveraging the progress of object-centric learning in parsing semantically meaningful slots from images, we tackle the challenge of learning slot extraction from ImageNet-pretrained vision transformers by analyzing meaningful concept properties. We further introduce a primitive selection and aggregation mechanism to harness concept-level image understanding. Additionally, we propose a method-agnostic self-supervision approach to distill sample-wise concept-based similarity information into the classifier, reducing reliance on incorrect or partial concepts for classification. Experiments show CompSLOT significantly enhances various continual learners and provides a universal concept-level module for the community.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Proposes CompSLOT framework extracting interpretable concepts from vision transformers to enhance continual learning.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Primitive selection and aggregation mechanism for extracting class-relevant concepts while maintaining robustness
Primitive-logit knowledge distillation enforcing concept-based sample similarity regularization
Method-agnostic self-supervision reducing reliance on incorrect or partial concepts for classification

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Slot attention
Object-centric learning
Vision transformers
Knowledge distillation
Concept learning

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

ImageNet
CUB200
COBJ

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Concept learning must precede providing conceptual self-supervision to continual learning task
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore end-to-end integration of mechanism into continual learning pipeline
from the paper
Study joint effect when combining with regularization methods that also manipulate logits
from the paper

Author keywords

Continual learning

Something off? Let us know →

Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models

Abstract

Author keywords

Related orals

Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation

Depth Anything 3: Recovering the Visual Space from Any Views

Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator

Radiometrically Consistent Gaussian Surfels for Inverse Rendering

True Self-Supervised Novel View Synthesis is Transferable