ICLR 2026 Orals

HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers

Mohamed Malhou, Ludovic Perret, Kristin E. Lauter

LLMs & Reasoning Fri, Apr 24 · 11:30 AM–11:40 AM · 204 A/B Avg rating: 4.67 (4–6)
Author-provided TL;DR

Efficient hierarchical attention transformers for learning to solve non-linear equations through by computing groebner bases.

Abstract

At NeurIPS 2024, Kera (2311.12904) introduced the use of transformers for computing Groebner bases, a central object in computer algebra with numerous practical applications. In this paper, we improve this approach by applying Hierarchical Attention Transformers (HATs) to solve systems of multivariate polynomial equations via Groebner bases computation. The HAT architecture incorporates a tree-structured inductive bias that enables the modeling of hierarchical relationships present in the data and thus achieves significant computational savings compared to conventional flat attention models. We generalize to arbitrary depths and include a detailed computational cost analysis. Combined with curriculum learning, our method solves instances that are much larger than those in Kera (2311.12904).

One-sentence summary·Auto-generated by claude-haiku-4-5-20251001(?)

HATSolver uses hierarchical attention transformers to compute Gröbner bases for multivariate polynomial systems more efficiently than flat attention models.

Contributions·Auto-generated by claude-haiku-4-5-20251001(?)
  • Applies Hierarchical Attention Transformers to solve multivariate polynomial systems via Gröbner basis computation
  • HAT architecture incorporates tree-structured inductive bias for modeling hierarchical relationships in polynomial data
  • Demonstrates substantial computational improvements and superior scalability compared to standard transformer baseline
Methods used·Auto-generated by claude-haiku-4-5-20251001(?)
  • Hierarchical Attention Transformers
  • Curriculum learning
  • Gröbner basis computation
Limitations (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Token count grows rapidly with number of variables and total degree of equations, making sequence-to-sequence training challenging
    from the paper
Future work (author-stated)·Auto-generated by claude-haiku-4-5-20251001(?)
  • Investigate whether models trained on any finite field can generalize to unseen fields including non-prime fields used in cryptography
    from the paper
  • Align training objective with computational steps of Gröbner basis algorithm
    from the paper
  • Investigate whether HATSolver implicitly learns algorithmic primitives or requires explicit supervision on intermediate steps
    from the paper

Author keywords

  • Hierarchical Attention Transformer
  • Groebner Basis
  • Symbolic Computation
  • Multivariate Polynomial Equations

Related orals

Something off? Let us know →