Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

Guoqi Yu, Juncheng Wang, Chen Yang, Jing Qin, Angelica I Aviles-Rivero, Shujun Wang

LLMs & Reasoning Sat, Apr 25 · 11:06 AM–11:16 AM · 201 C Avg rating: 6.00 (4–8)

Author-provided TL;DR

We propose a centralized module to replace decentralized attention in Transformer for centralized medical time series like EEG and ECG.

Abstract

Accurate analysis of Medical time series (MedTS) data, such as Electroencephalography (EEG) and Electrocardiography (ECG), plays a pivotal role in healthcare applications, including the diagnosis of brain and heart diseases. MedTS data typically exhibits two critical patterns: **temporal dependencies** within individual channels and **channel dependencies** across multiple channels. While recent advances in deep learning have leveraged Transformer-based models to effectively capture temporal dependencies, they often struggle to model channel dependencies. This limitation stems from a structural mismatch: ***MedTS signals are inherently centralized, whereas the Transformer's attention is decentralized***, making it less effective at capturing global synchronization and unified waveform patterns. To bridge this gap, we propose **CoTAR** (Core Token Aggregation-Redistribution), a centralized MLP-based module tailored to replace the decentralized attention. Instead of allowing all tokens to interact directly, as in attention, CoTAR introduces a global core token that acts as a proxy to facilitate the inter-token interaction, thereby enforcing a centralized aggregation and redistribution strategy. This design not only better aligns with the centralized nature of MedTS signals but also reduces computational complexity from quadratic to linear. Experiments on five benchmarks validate the superiority of our method in both effectiveness and efficiency, achieving up to a **12.13%** improvement on the APAVA dataset, with merely 33% memory usage and 20% inference time compared to the previous state-of-the-art. Code and all training scripts are available in this [**Link**](https://github.com/Levi-Ackman/TeCh).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

CoTAR replaces transformer attention with centralized MLP module for efficient medical time series modeling, reducing complexity to linear.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Identifies mismatch between centralized nature of medical time series and decentralized attention structure
Proposes Core Token Aggregation-Redistribution (CoTAR) using global core token as proxy for inter-token interaction
Reduces computational complexity from quadratic to linear while improving channel dependency modeling
Achieves 12.13% improvement on APAVA dataset with 33% memory usage and 20% inference time

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Transformer architectures
MLP modules
Attention mechanisms
Time series analysis

Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)

EEG datasets
ECG datasets
APAVA dataset

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

EEG
ECG
Deep learning
Transformer

Something off? Let us know →

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis