Exploratory Causal Inference in SAEnce

Tommaso Mencattini, Riccardo Cadei, Francesco Locatello

Interpretability & Mechanistic Understanding Fri, Apr 24 · 11:18 AM–11:28 AM · 201 C Avg rating: 7.00 (4–8)

Author-provided TL;DR

New method to uncover causal treatment effects directly from trial data using foundation models, SAE and recursive stratification, without any prior and supervision.

Abstract

Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a Sparse Auto Encoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce _Neural Effect Search_, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

Uses sparse autoencoders and foundation models to discover unknown causal effects in scientific trials.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Neural Effect Search, a recursive statistical procedure addressing multiple-testing issues and effect entanglement
Progressive stratification ensuring robust causal effect identification at neural level
First successful unsupervised causal effect identification on real-world scientific trial

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Sparse autoencoders
Foundation models
Statistical hypothesis testing
Exploratory causal inference
Causal representation learning

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Assumes observed variables X adequately capture information about unknown Y; data sufficiency assumption
from the paper
Assumes foundation models encode concepts linearly and SAEs can approximately recover effects
from the paper
Identifiability assumption is strongest; SAE identifiability theory not currently as well understood as causal representations
from the paper
Domain experts can only use method 'as rescue system for hypotheses they may have missed' until proper annotations and rationalist approach applied
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Authors did not state explicit future directions.

Author keywords

Randomized Controlled Trials
Sparse Auto Encoder
Interpretability
Causal Inference

Something off? Let us know →

Exploratory Causal Inference in SAEnce

Abstract

Author keywords

Related orals

Verifying Chain-of-Thought Reasoning via Its Computational Graph

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

Temporal superposition and feature geometry of RNNs under memory demands

Addressing divergent representations from causal interventions on neural networks