Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
Weiduo Liao, Fei Han, Hisao Ishibuchi, Qingfu Zhang, Ying Wei
We introduce CompSLOT, a universal concept learning method to continual learning with foundation models system to establish a concept-level understanding of class prediction for alternative continual learners.
Abstract
Vision learners often struggle with catastrophic forgetting due to their reliance on class recognition by comparison, rather than understanding classes as compositions of representative concepts. This limitation is prevalent even in state-of-the-art continual learners with foundation models and worsens when current tasks contain few classes. Inspired by the recent success of concept-level understanding in mitigating forgetting, we design a universal framework CompSLOT to guide concept learning across diverse continual learners. Leveraging the progress of object-centric learning in parsing semantically meaningful slots from images, we tackle the challenge of learning slot extraction from ImageNet-pretrained vision transformers by analyzing meaningful concept properties. We further introduce a primitive selection and aggregation mechanism to harness concept-level image understanding. Additionally, we propose a method-agnostic self-supervision approach to distill sample-wise concept-based similarity information into the classifier, reducing reliance on incorrect or partial concepts for classification. Experiments show CompSLOT significantly enhances various continual learners and provides a universal concept-level module for the community.
Proposes CompSLOT framework extracting interpretable concepts from vision transformers to enhance continual learning.
- Primitive selection and aggregation mechanism for extracting class-relevant concepts while maintaining robustness
- Primitive-logit knowledge distillation enforcing concept-based sample similarity regularization
- Method-agnostic self-supervision reducing reliance on incorrect or partial concepts for classification
- Slot attention
- Object-centric learning
- Vision transformers
- Knowledge distillation
- Concept learning
- ImageNet
- CUB200
- COBJ
Concept learning must precede providing conceptual self-supervision to continual learning task
from the paper
Explore end-to-end integration of mechanism into continual learning pipeline
from the paperStudy joint effect when combining with regularization methods that also manipulate logits
from the paper
Author keywords
- Continual learning
Related orals
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
Capacity manipulation improves diffusion models' handling of class-imbalanced data by reserving capacity for minority classes via low-rank decomposition.
Depth Anything 3: Recovering the Visual Space from Any Views
DA3 predicts spatially consistent 3D geometry from arbitrary camera views using plain transformer and depth-ray targets.
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
VIST3A stitches text-to-video models with 3D reconstruction systems and aligns them via reward finetuning for high-quality text-to-3D generation.
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
RadioGS introduces radiometric consistency supervision for inverse rendering to accurately model indirect illumination in Gaussian-based representations.
True Self-Supervised Novel View Synthesis is Transferable
Presents XFactor, first geometry-free self-supervised model for transferable novel view synthesis without 3D inductive biases.