HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers

Mohamed Malhou, Ludovic Perret, Kristin E. Lauter

LLMs & Reasoning Fri, Apr 24 · 11:30 AM–11:40 AM · 204 A/B Avg rating: 4.67 (4–6)

Author-provided TL;DR

Efficient hierarchical attention transformers for learning to solve non-linear equations through by computing groebner bases.

Abstract

At NeurIPS 2024, Kera (2311.12904) introduced the use of transformers for computing Groebner bases, a central object in computer algebra with numerous practical applications. In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of multivariate polynomial equations via Groebner bases computation. The HAT architecture incorporates a tree-structured inductive bias that enables the modeling of hierarchical relationships present in the data and thus achieves significant computational savings compared to conventional flat attention models. We generalize to arbitrary depths and include a detailed computational cost analysis. Combined with curriculum learning, our method solves instances that are much larger than those in Kera (2311.12904).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

HATSolver uses hierarchical attention transformers to compute Gröbner bases for multivariate polynomial systems more efficiently than flat attention models.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)

Applies Hierarchical Attention Transformers to solve multivariate polynomial systems via Gröbner basis computation
HAT architecture incorporates tree-structured inductive bias for modeling hierarchical relationships in polynomial data
Demonstrates substantial computational improvements and superior scalability compared to standard transformer baseline

Methods used·Auto-generated by claude-haiku-4-5-20251001(?)

Hierarchical Attention Transformers
Curriculum learning
Gröbner basis computation

Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Token count grows rapidly with number of variables and total degree of equations, making sequence-to-sequence training challenging
from the paper

Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)

Investigate whether models trained on any finite field can generalize to unseen fields including non-prime fields used in cryptography
from the paper
Align training objective with computational steps of Gröbner basis algorithm
from the paper
Investigate whether HATSolver implicitly learns algorithmic primitives or requires explicit supervision on intermediate steps
from the paper

Author keywords

Hierarchical Attention Transformer
Groebner Basis
Symbolic Computation
Multivariate Polynomial Equations

Something off? Let us know →

HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers

Abstract

Author keywords

Related orals

Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models

Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents

RefineStat: Efficient Exploration for Probabilistic Program Synthesis