ICLR 2026 Orals

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

Guoqi Yu, Juncheng Wang, Chen Yang, Jing Qin, Angelica I Aviles-Rivero, Shujun Wang

LLMs & Reasoning Sat, Apr 25 · 11:06 AM–11:16 AM · 201 C Avg rating: 6.00 (4–8)
Author-provided TL;DR

We propose a centralized module to replace decentralized attention in Transformer for centralized medical time series like EEG and ECG.

Abstract

Accurate analysis of Medical time series (MedTS) data, such as Electroencephalography (EEG) and Electrocardiography (ECG), plays a pivotal role in healthcare applications, including the diagnosis of brain and heart diseases. MedTS data typically exhibits two critical patterns: **temporal dependencies** within individual channels and **channel dependencies** across multiple channels. While recent advances in deep learning have leveraged Transformer-based models to effectively capture temporal dependencies, they often struggle to model channel dependencies. This limitation stems from a structural mismatch: ***MedTS signals are inherently centralized, whereas the Transformer's attention is decentralized***, making it less effective at capturing global synchronization and unified waveform patterns. To bridge this gap, we propose **CoTAR** (Core Token Aggregation-Redistribution), a centralized MLP-based module tailored to replace the decentralized attention. Instead of allowing all tokens to interact directly, as in attention, CoTAR introduces a global core token that acts as a proxy to facilitate the inter-token interaction, thereby enforcing a centralized aggregation and redistribution strategy. This design not only better aligns with the centralized nature of MedTS signals but also reduces computational complexity from quadratic to linear. Experiments on five benchmarks validate the superiority of our method in both effectiveness and efficiency, achieving up to a **12.13%** improvement on the APAVA dataset, with merely 33% memory usage and 20% inference time compared to the previous state-of-the-art. Code and all training scripts are available in this [**Link**](https://github.com/Levi-Ackman/TeCh).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

CoTAR replaces transformer attention with centralized MLP module for efficient medical time series modeling, reducing complexity to linear.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Identifies mismatch between centralized nature of medical time series and decentralized attention structure
  • Proposes Core Token Aggregation-Redistribution (CoTAR) using global core token as proxy for inter-token interaction
  • Reduces computational complexity from quadratic to linear while improving channel dependency modeling
  • Achieves 12.13% improvement on APAVA dataset with 33% memory usage and 20% inference time
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Transformer architectures
  • MLP modules
  • Attention mechanisms
  • Time series analysis
Datasets used·Auto-generated by claude-haiku-4-5-20251001(?)
  • EEG datasets
  • ECG datasets
  • APAVA dataset
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit limitations.

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • EEG
  • ECG
  • Deep learning
  • Transformer

Related orals

Something off? Let us know →