HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
Mohamed Malhou, Ludovic Perret, Kristin E. Lauter
Efficient hierarchical attention transformers for learning to solve non-linear equations through by computing groebner bases.
Abstract
At NeurIPS 2024, Kera (2311.12904) introduced the use of transformers for computing Groebner bases, a central object in computer algebra with numerous practical applications. In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of multivariate polynomial equations via Groebner bases computation. The HAT architecture incorporates a tree-structured inductive bias that enables the modeling of hierarchical relationships present in the data and thus achieves significant computational savings compared to conventional flat attention models. We generalize to arbitrary depths and include a detailed computational cost analysis. Combined with curriculum learning, our method solves instances that are much larger than those in Kera (2311.12904).
HATSolver uses hierarchical attention transformers to compute Gröbner bases for multivariate polynomial systems more efficiently than flat attention models.
- Applies Hierarchical Attention Transformers to solve multivariate polynomial systems via Gröbner basis computation
- HAT architecture incorporates tree-structured inductive bias for modeling hierarchical relationships in polynomial data
- Demonstrates substantial computational improvements and superior scalability compared to standard transformer baseline
- Hierarchical Attention Transformers
- Curriculum learning
- Gröbner basis computation
Token count grows rapidly with number of variables and total degree of equations, making sequence-to-sequence training challenging
from the paper
Investigate whether models trained on any finite field can generalize to unseen fields including non-prime fields used in cryptography
from the paperAlign training objective with computational steps of Gröbner basis algorithm
from the paperInvestigate whether HATSolver implicitly learns algorithmic primitives or requires explicit supervision on intermediate steps
from the paper
Author keywords
- Hierarchical Attention Transformer
- Groebner Basis
- Symbolic Computation
- Multivariate Polynomial Equations
Related orals
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.