ICLR 2026 Orals

Exploratory Causal Inference in SAEnce

Tommaso Mencattini, Riccardo Cadei, Francesco Locatello

Interpretability & Mechanistic Understanding Fri, Apr 24 · 11:18 AM–11:28 AM · 201 C Avg rating: 7.00 (4–8)
Author-provided TL;DR

New method to uncover causal treatment effects directly from trial data using foundation models, SAE and recursive stratification, without any prior and supervision.

Abstract

Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a Sparse Auto Encoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce _Neural Effect Search_, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Uses sparse autoencoders and foundation models to discover unknown causal effects in scientific trials.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Neural Effect Search, a recursive statistical procedure addressing multiple-testing issues and effect entanglement
  • Progressive stratification ensuring robust causal effect identification at neural level
  • First successful unsupervised causal effect identification on real-world scientific trial
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Sparse autoencoders
  • Foundation models
  • Statistical hypothesis testing
  • Exploratory causal inference
  • Causal representation learning
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Assumes observed variables X adequately capture information about unknown Y; data sufficiency assumption
    from the paper
  • Assumes foundation models encode concepts linearly and SAEs can approximately recover effects
    from the paper
  • Identifiability assumption is strongest; SAE identifiability theory not currently as well understood as causal representations
    from the paper
  • Domain experts can only use method 'as rescue system for hypotheses they may have missed' until proper annotations and rationalist approach applied
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

  • Randomized Controlled Trials
  • Sparse Auto Encoder
  • Interpretability
  • Causal Inference

Related orals

Something off? Let us know →