Exploratory Causal Inference in SAEnce
Tommaso Mencattini, Riccardo Cadei, Francesco Locatello
New method to uncover causal treatment effects directly from trial data using foundation models, SAE and recursive stratification, without any prior and supervision.
Abstract
Randomized Controlled Trials are one of the pillars of science; nevertheless, they rely on hand-crafted hypotheses and expensive analysis. Such constraints prevent causal effect estimation at scale, potentially anchoring on popular yet incomplete hypotheses. We propose to discover the unknown effects of a treatment directly from data. For this, we turn unstructured data from a trial into meaningful representations via pretrained foundation models and interpret them via a Sparse Auto Encoder. However, discovering significant causal effects at the neural level is not trivial due to multiple-testing issues and effects entanglement. To address these challenges, we introduce _Neural Effect Search_, a novel recursive procedure solving both issues by progressive stratification. After assessing the robustness of our algorithm on semi-synthetic experiments, we showcase, in the context of experimental ecology, the first successful unsupervised causal effect identification on a real-world scientific trial.
Uses sparse autoencoders and foundation models to discover unknown causal effects in scientific trials.
- Neural Effect Search, a recursive statistical procedure addressing multiple-testing issues and effect entanglement
- Progressive stratification ensuring robust causal effect identification at neural level
- First successful unsupervised causal effect identification on real-world scientific trial
- Sparse autoencoders
- Foundation models
- Statistical hypothesis testing
- Exploratory causal inference
- Causal representation learning
Assumes observed variables X adequately capture information about unknown Y; data sufficiency assumption
from the paperAssumes foundation models encode concepts linearly and SAEs can approximately recover effects
from the paperIdentifiability assumption is strongest; SAE identifiability theory not currently as well understood as causal representations
from the paperDomain experts can only use method 'as rescue system for hypotheses they may have missed' until proper annotations and rationalist approach applied
from the paper
Authors did not state explicit future directions.
Author keywords
- Randomized Controlled Trials
- Sparse Auto Encoder
- Interpretability
- Causal Inference
Related orals
Verifying Chain-of-Thought Reasoning via Its Computational Graph
CRV uses attribution graphs as execution traces to verify chain-of-thought reasoning with white-box mechanistic analysis of computation failures.
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Temporal Sparse Autoencoders incorporate contrastive loss encouraging consistent feature activations over adjacent tokens to discover semantic concepts.
Temporal superposition and feature geometry of RNNs under memory demands
Studies temporal superposition in RNNs showing how memory demands affect representational geometry and RNNs learn different encoding strategies.
Addressing divergent representations from causal interventions on neural networks
Study of causal interventions showing they produce out-of-distribution representations, proposing Counterfactual Latent loss to mitigate harmful divergences.