Sequences of Logits Reveal the Low Rank Structure of Language Models
Noah Golowich, Allen Liu, Abhishek Shetty
We exploit the low-rank structure of the logit matrices of LLMs to draw new empirical and theoretical conclusions.
Abstract
A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation --- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts.
On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.
Extended logit matrices reveal low-rank structure of language models enabling linear generation from unrelated prompts.
- Demonstrates wide range of modern LLMs exhibit low-rank structure in logit matrices across prompts and responses
- Shows low-rank structure can be leveraged for generation via linear combinations of model outputs
- Develops simple universal abstraction with theoretical predictions paralleling experiments
- Analyzes representation power and provides provable learning guarantees
- Low-rank matrix analysis
- Spectral methods
Authors did not state explicit limitations.
Better understand how singular value decay evolves during training and use as diagnostic for training progress
from the paperExtract concepts and features from low-rank representation space for model-agnostic interpretability
from the paperDevelop techniques to bypass safety guardrails using LINGEN and related frameworks
from the paperExplore safeguarding techniques against attacks suggested by the framework
from the paperExtend theoretical results to approximately low-rank models and investigate notions of approximation
from the paper
Author keywords
- Large language models
- low-rank structure
Related orals
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.