BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals

Chenqi Li, Yu Liu, Timothy Denison, Tingting Zhu

Multimodal & Speech Sat, Apr 25 · 10:30 AM–10:40 AM · 201 C Avg rating: 6.00 (4–8)

Abstract

Biosignals offer valuable insights into the physiological states of the human body. Although biosignal modalities differ in functionality, signal fidelity, sensor comfort, and cost, they are often intercorrelated, reflecting the holistic and interconnected nature of human physiology. This opens up the possibility of performing the same tasks using alternative biosignal modalities, thereby improving the accessibility, usability, and adaptability of health monitoring systems. However, the limited availability of large labeled datasets presents challenges for training models tailored to specific tasks and modalities of interest. Unsupervised cross-modal knowledge transfer offers a promising solution by leveraging knowledge from an existing modality to support model training for a new modality. Existing methods are typically based on knowledge distillation, which requires running a teacher model alongside student model training, resulting in high computational and memory overhead. This challenge is further exacerbated by the recent development of foundation models that demonstrate superior performance and generalization across tasks at the cost of large model sizes. To this end, we explore a new framework for unsupervised cross-modal knowledge transfer of biosignals by training a lightweight bridge network to align the intermediate representations and enable information flow between foundation models and across modalities. Specifically, we introduce an efficient strategy for selecting alignment positions where the bridge should be constructed, along with a flexible prototype network as the bridge architecture. Extensive experiments across multiple biosignal modalities, tasks, and datasets show that BioX-Bridge reduces the number of trainable parameters by 88-99\% while maintaining or even improving transfer performance compared to state-of-the-art methods.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

BioX-Bridge enables parameter-efficient cross-modal knowledge transfer across biosignals using lightweight prototype-based bridge networks between foundation models.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Efficient framework for unsupervised cross-modal biosignal transfer without knowledge distillation overhead
Two-stage bridge position selection strategy identifying optimal connection points between representations
Prototype network architecture reducing trainable parameters by 88-99% while maintaining performance

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

model bridging
prototype networks
cross-modal transfer
foundation models

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

ISRUC
WESAD
FOG

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Depends on availability of pre-trained models for each biosignal modality
from the paper
Inference time depends on bridge position within model
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Explore task-agnostic methods for better generality in multi-task scenarios
from the paper
Investigate transfer using unpaired data for any modality combination
from the paper
Explore BioX-Bridge for datasets with more than two modalities
from the paper

Author keywords

biosignal
ai for healthcare
humans and ai
unsupervised cross-modal knowledge transfer

Something off? Let us know →

BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals

Abstract

Author keywords

Related orals

Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

VibeVoice: Expressive Podcast Generation with Next-Token Diffusion

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction