$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
p-less sampling dynamically sets truncation threshold using information theory for hyperparameter-free LLM decoding with robust quality at high temperatures.
An unofficial reader's companion. Browse by topic, search, read AI-assisted summaries, or jump to the cross-paper trends view.
p-less sampling dynamically sets truncation threshold using information theory for hyperparameter-free LLM decoding with robust quality at high temperatures.
PhyWorldBench evaluates text-to-video models on physics adherence across fundamental, composite, and anti-physics scenarios.
Representer theorem for Hawkes processes shows dual coefficients are analytically fixed to unity via penalized least squares.
FFDP framework scales image registration to 100μm human brain MRI volumes using IO-aware kernels and distributed tensor sharding.
Large-scale study comparing LLM-graph interaction modes for node classification, finding code generation outperforms prompting on long-text and high-degree graphs.
AdAEM dynamically generates value-assessment questions for LLMs by probing internal value boundaries using in-context optimization.
Study of causal interventions showing they produce out-of-distribution representations, proposing Counterfactual Latent loss to mitigate harmful divergences.
ADP lightweight protocol unifies 13 heterogeneous agent datasets into single training schema achieving 20% average performance gain over base models.
Presents unified RL framework for training LLM agents on long-horizon decision-making with staged interaction scaling.
AnyUp inference-time feature upsampler generalizes across different feature types and resolutions without encoder-specific retraining.
Presents AstaBench, comprehensive benchmark suite with production-grade tools for rigorous evaluation of AI agents on scientific research tasks.
AutoEP uses LLM reasoning with real-time landscape analysis to dynamically control metaheuristic algorithms without training.
Benchmarks practical privacy risks in differential privacy-adapted LLMs, revealing distribution shifts and model choice impact effectiveness.
Framework detects self-initiated deception in LLMs via statistical metrics showing both deceptive intention and behavior correlate with task difficulty.
BioX-Bridge enables parameter-efficient cross-modal knowledge transfer across biosignals using lightweight prototype-based bridge networks between foundation models.
BIRD-INTERACT benchmark evaluates LLMs on dynamic multi-turn text-to-SQL tasks with function-driven user simulator and dual interaction settings.
Generates diverse synthetic time series for pretraining foundation models with clear scaling laws.
Develops causal structure learning framework for Hawkes processes identifying latent confounder subprocesses.
Theoretical bounds on polyhedral complex connectivity and diameter reveal fundamental ReLU network geometry properties.
WebDevJudge benchmark reveals significant LLM-as-judge gaps due to failures in functional equivalence and feasibility verification.
CoCo framework captures compactness and consistency in graph neural network representations for improved deep graph clustering.
Introduces CDGS integrating compositional diffusion with guided search for coherent long-horizon plan generation.
CRC optimizes prediction set construction under explicit robustness constraints instead of coverage for more efficient robust decisions.
CounselBench large-scale benchmark with 2000 expert evaluations and 120 adversarial questions for evaluating LLMs in mental health question answering.
Expert-Router Coupling loss tightly couples MoE router decisions with expert capabilities by treating router embeddings as proxy tokens.
Cross-domain lossy compression unifies rate and classification constraints via optimal transport framework.
CyberGym benchmarks AI agents on 1,507 real-world vulnerabilities discovering 34 zero-days, showing top models achieve only 22% success on PoC generation.
Distills AlphaFold3 into single-step sampler with temporal geodesic matching achieving 15x inference acceleration.
CoTAR replaces transformer attention with centralized MLP module for efficient medical time series modeling, reducing complexity to linear.
DA3 predicts spatially consistent 3D geometry from arbitrary camera views using plain transformer and depth-ray targets.
DepthLM shows VLMs can match pure vision models in metric depth estimation with text-based supervised finetuning and visual prompting without architecture changes.
DiffMPC provides GPU-accelerated differentiable MPC solver leveraging problem structure for efficient parallelization.
WGM-based methods provide efficient domain discovery with near-optimal guarantees for missing mass on Zipfian data.
EBTs frame System 2 thinking as energy minimization enabling inference-time reasoning emergence across modalities.
Prophet identifies early answer convergence in diffusion language models to accelerate decoding by 3.4x on reasoning tasks.
DiffusionNFT enables efficient online reinforcement learning for diffusion models via forward process optimization with up to 25x efficiency gains.
Proposes Discount Model Search for quality diversity optimization in high-dimensional measure spaces.
Characterizes distributional equivalence for linear non-Gaussian latent-variable cyclic causal models without structural assumptions.
DTO-KD uses multi-objective optimization to dynamically balance task and distillation losses at gradient level for better knowledge distillation.
Introduces EditBench benchmark for real-world LLM code editing with 545 problems from actual developer usage.
PAPL aligns discrete diffusion training with planning-based inference via planned ELBO for improved text and protein generation.
WASI applies subspace-based training to transformer models reducing memory by 62x and FLOPs by 2x while maintaining accuracy on edge devices.
EigenBench measures language model value alignment using model ensemble judgments aggregated with EigenTrust without ground truth labels.
EmotionThinker reformulates speech emotion recognition as deep reasoning with prosody enhancement and specialized reinforcement learning.
Common Corpus releases 2 trillion permissively-licensed tokens for open-science LLM pre-training covering diverse languages.
AIGB-Pearl enhances generative auto-bidding with trajectory evaluator and KL-Lipschitz-constrained optimization for safe exploration beyond offline data.
Ellipse signatures function as forgery-resistant model output identifiers based on high-dimensional geometric constraints.
Graph embeddings exhibit exchangeability property, enabling efficient graph retrieval via transport-based similarity approximation with locality-sensitive hashing.
Uses sparse autoencoders and foundation models to discover unknown causal effects in scientific trials.
Proposes ExDM using diffusion models for exploration and policy learning in unsupervised reinforcement learning.
ReaSyn iteratively refines synthetic pathways bidirectionally with discrete flow models for synthesizable molecular design.
Reveals long sequence modeling degrades gene expression prediction; proximal epigenomic signals with confounding mitigation suffice.
Triple-BERT addresses order dispatching via centralized SARL with action decomposition and BERT-based attention.
Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.
Frozen-PINNs employ space-time separation with random features for fast, accurate PDE solving without gradient descent.
FIRE balances stability-plasticity tradeoff using Frobenius error and isometry deviation constraints without heavy hyperparameter tuning.
Accelerates video LLMs via training-free spatiotemporal token merging, retaining 99.1% performance with 10% of tokens.
Proposes FlashWorld generating high-quality 3D scenes in seconds using dual-mode diffusion with cross-mode distillation.
UFEval provides unified fine-grained evaluation of multimodal LLM outputs with aspect and task generalization.
Characterizes in-context learning capabilities of Mamba, showing it learns optimal Laplacian smoothing estimator.
RNN models of hippocampus reveal how locomotor development statistics shape emergence of spatial neural representations.
Gaia2 benchmarks LLM agents in asynchronous dynamic environments with action-level verification for RL training.
Analyzes machine unlearning in high dimensions showing single noisy Newton step with Gaussian noise suffices for privacy-accuracy.
EditVerse unifies image and video generation/editing via token sequences enabling cross-modal knowledge transfer.
Introduces distribution-over-distribution model combining geometry distributions with two-stage flow matching for human 3D generation.
OmniVerifier provides universal visual verification for multimodal reasoning and introduces sequential test-time scaling for image generation and editing.
GEPA uses genetic-Pareto selection with natural language reflection to outperform RL-based prompt optimization with 35x fewer rollouts.
GLASS Flows samples Markov transitions via inner flow matching models to improve inference-time reward alignment in flow and diffusion models.
Solves optimal multi-draft speculative sampling via convex optimization achieving 90% acceptance rates.
Proposes Recursive Likelihood Ratio optimizer for efficient fine-tuning of diffusion models with lower variance gradient estimation.
Gradient-aware diagnostic tool using saliency to identify hallucination patterns, proposing SGRS and LocoRE interventions to reduce output errors.
HATSolver uses hierarchical attention transformers to compute Gröbner bases for multivariate polynomial systems more efficiently than flat attention models.
Demonstrates that covariance matching procedure improves synthetic data quality for training neural networks better than mean shift or other approaches.
Gradient leading-term analysis reveals how semantic associations emerge in transformers as compositions of bigram, interchangeability, and context mapping functions.
Study reveals incompatibility between ascending quality curriculum and decaying learning rate in LLM pretraining, proposing moderated decay and model averaging solutions.
Work establishes meta-evaluation measures showing many micro-benchmarks cannot reliably rank similar-performing models.
Releases Hubble suite of open-source LLMs with controlled perturbed variants to systematically study memorization risks.
HGM identifies metaproductivity-performance mismatch and uses clade-based lineage metrics to guide self-improving coding agents.
Hyperparameter Trajectory Inference uses conditional Lagrangian optimal transport to reconstruct neural network outputs across hyperparameter spectra without expensive retraining.
Capacity manipulation improves diffusion models' handling of class-imbalanced data by reserving capacity for minority classes via low-rank decomposition.
In-Place TTT framework enables LLMs to perform test-time training by adapting MLP projection matrices with alignment to next-token prediction.
AgentFlow trainable in-the-flow agentic system using Flow-GRPO for on-policy learning with long-horizon sparse rewards.
Shows InfoNCE loss induces Gaussian distribution in contrastive representations, providing principled explanation for observed Gaussianity.
Proposes information-theoretic Lagrangian formulation to balance simplicity and expressiveness in Koopman representation learning for dynamical systems.
InfoTok achieves adaptive video tokenization using information-theoretic compression and ELBO-based routing.
Avatar generation framework using MLLM semantic planning and specialized MMDiT for coherent character animations aligned with multimodal context.
Theory of context length scaling through Intrinsic Entropy explaining optimal context length and training dataset size relationship.
Demonstrates LLMs can be finetuned to generate harmful steganographically-hidden outputs while appearing benign to safety systems.
Detects implicit reward hacking by measuring reasoning effort through truncated CoT analysis.
einx is universal notation for tensor operations using vectorization, reducing large APIs to small consistent operation sets.
LatentFT provides frequency-domain controls for generative music via diffusion autoencoder with latent-space Fourier transform enabling timescale-based manipulation.
LPWM enables self-supervised object-centric world modeling with latent action module for stochastic video generation and control.
Aggregates speech tokens into latent patches for efficient speech-text modeling with cross-modal alignment.
Systematic study reveals LLMs acquire visual perception priors from diverse data and reasoning priors from code/math corpora.
L2Seg accelerates vehicle routing solvers 2-7x by learning to identify stable and unstable solution segments.
Proposes framework to handle noisy entity-attribute and inter-graph correspondences in multi-modal entity alignment.
HyCa uses hybrid ODE solvers with dimension-wise caching strategies to accelerate diffusion transformers by 5-6x without retraining.
LLM DNA low-dimensional functional representation reveals evolutionary relationships among 305 LLMs through phylogenetic analysis.
Introduces semantically conditioned watermarks for robust and stealthy LLM fingerprinting robust to deployment scenarios.
Study showing LLMs exhibit 39% average performance drop in multi-turn conversations, failing to recover from wrong contextual assumptions.
Introduces parallel decoding for autoregressive image generation with flexible ordering achieving 3.4x latency reduction.
LongWriter-Zero applies RL from scratch to achieve ultra-long text generation without synthetic training data.
LoongRL uses emergent plan-retrieve-reason-recheck pattern trained on long-context tasks to generalize beyond training length.
Mamba-3 achieves 1.8 percentage point accuracy gain over Mamba-2 via expressive recurrence, complex-valued state updates, and MIMO formulation.
SparseRL leverages deep RL and pretrained models to generate high-performance CUDA code for sparse matrix operations.
MC-Search benchmark evaluates multimodal agentic RAG with step-wise reasoning chains and introduces Search-Align for improved planning.
mCLM uses modular chemical language combining natural language and molecular building blocks for function-aware synthesis.
MVP achieves fastest one-step action generation with instantaneous velocity constraint providing high expressiveness for robotic control.
MedAgentGym provides scalable sandbox environment with 72K biomedical tasks for training code-centric LLM agents with RL.
MemAgent uses RL-trained memory modules to enable LLMs to extrapolate from 8K to 3.5M token contexts with minimal performance degradation.
MetaEmbed uses learnable meta tokens with matryoshka training to enable test-time scaling for multimodal retrieval balancing quality and efficiency.
MoEs with optimal activation rates surpass dense LLMs under equal resource constraints (parameters, compute, data) with data reuse strategy.
MF-GIA framework enables graph neural networks to perform in-context learning across heterogeneous domains without modality assumptions using gradient fingerprints.
MomaGraph learns unified task-oriented scene representations integrating spatial-functional relationships for embodied agents to perform planning and manipulation.
RoSE estimates surface normals via shading sequence prediction, addressing 3D misalignment in monocular normal estimation.
Introduces MotionStream enabling sub-second latency motion-controlled infinite-length video generation via causal diffusion.
MrRoPE generalizes RoPE-extension via radix system conversion, achieving train-short-test-long with doubled effective context window.
GraphGlue uses Riemannian geometry to merge multi-domain graphs into unified manifolds, enabling knowledge transfer across graph domains.
MASK aligns semantic knowledge between images and text using word embeddings as bridges to match out-of-distribution words in unpaired matching.
MNPO extends Nash learning to multiplayer regime for aligning LLMs with heterogeneous human preferences via n-player game formulation.
Interprets neural autoencoders as dynamical systems with latent vector fields to analyze generalization, memorization, and out-of-distribution detection.
Neon inverts model degradation from self-training by extrapolating away from it, improving generative models with minimal compute.
NextStep-1 achieves state-of-the-art autoregressive text-to-image generation by modeling continuous image tokens with lightweight flow matching instead of diffusion.
Provides first finite-confidence analysis of Track-and-Stop and Sticky Track-and-Stop algorithms for pure exploration problems.
Develops efficient federated optimization algorithm with cost-aware client selection achieving best communication and local complexity.
Omni-Reward addresses modality imbalance and preference rigidity with omni-modal reward modeling framework.
Camera-Aware MLLM framework improves spatial reasoning by injecting camera parameters and using geometric augmentation.
Theoretical analysis shows difficult examples hurt unsupervised contrastive learning generalization more than supervised settings.
Shows decentralized learning with single global merging achieves convergence rates matching parallel SGD under data heterogeneity.
Geodesic PCA for probability distributions using Wasserstein geometry with neural network parametrization for continuous distributions.
Unified framework for imbalanced graph classification using dynamic balanced prototypes and prototype load-balancing optimization.
MRT systematically stress tests LLM agent monitoring revealing agent awareness dominates and hybrid scaffolding enables weak-to-strong.
OpenApps testbed reveals UI agent reliability varies drastically across app variations despite stable within-environment performance.
OpenThoughts releases open-source datasets and models for training reasoning tasks, achieving state-of-the-art on AIME and code benchmarks.
MoE sparsity investigation reveals optimal balance between active FLOPs and tokens-per-parameter for reasoning versus memorization.
OpTI-BFM uses optimistic decision criterion modeling uncertainty over reward functions to enable efficient task inference for behavior foundation models.
Hierarchical Speculative Decoding uses lossless verification to maximize accepted tokens while preserving target distribution fidelity.
Analyzes how overparametrization shifts BBP transition point in loss landscape, bending geometric properties.
DECS framework reduces reasoning model overthinking by decoupling necessary from redundant tokens via curriculum scheduling.
P-GenRM transforms user preferences into adaptive personas and scoring rubrics with test-time scaling for personalized reward modeling.
Enables parallel training of nonlinear RNNs via Newton's method achieving 665x speedup over sequential application.
Pareto-Conditioned Diffusion formulates offline multi-objective optimization as conditional sampling problem avoiding explicit surrogate models.
Partition Generative Models replace masking with partitioning for efficient parallel generation, achieving higher throughput than masked generative models.
PATEGAIL++ privacy-preserving trajectory generation framework using sensitivity-aware noise allocation for improved privacy-utility trade-off.
Enforces convex output constraints via operator splitting enabling fast parametric optimization solving.
Theoretical characterization shows MDMs are expressively equivalent to padded looped transformers, more efficient for parallel problems.
Proposes CompSLOT framework extracting interpretable concepts from vision transformers to enhance continual learning.
Shows optimal weight decay is 30x larger than standard practice; ensembling achieves lower loss asymptote enabling data-efficient pre-training at scale.
LeanHammer combines neural premise selection with symbolic automation for first end-to-end hammer in Lean proof assistant.
Proposes probabilistic kernel functions for angle testing enabling efficient approximate nearest neighbor search.
Q-RAG fine-tunes embedders for multi-step retrieval using reinforcement learning, achieving state-of-the-art on long-context QA.
Quantitative bounds show training length required for length generalization depends on periodicity, locality, alphabet size, and model norms.
Quotient-space diffusion models reduce learning difficulty for molecular structure generation via SE(3) symmetry handling.
RadioGS introduces radiometric consistency supervision for inverse rendering to accurately model indirect illumination in Gaussian-based representations.
Proposes RAIN-Merging to merge instruction-tuned and reasoning models while preserving structured thinking format.
RealPDEBench first benchmark integrating real-world measurements with paired simulations across five physical systems for scientific ML evaluation.
RALI framework aligns images to text representations from reasoning MLLMs using contrastive learning, achieving comparable image quality assessment performance with <5% parameters.
Power sampling algorithm elicits strong reasoning from base models at inference time via MCMC without additional training.
Introduces RedTeamCUA framework with hybrid web-OS sandbox for adversarial testing of computer-use agents.
Proposes T3 algorithm to detect belief deviation in LLM agents and truncate trajectories for improved reinforcement learning in active reasoning tasks.
RefineStat enforces semantic constraints and applies diagnostic-aware refinement for synthesizing valid probabilistic programs from smaller language models.
MetamerGen generates scene metamers aligned with human perception using foveal/peripheral features and latent diffusion.
Revela enables self-supervised retriever learning by adapting language modeling objectives, achieving unsupervised SoTA on multiple retrieval benchmarks.
Rodrigues Networks inject kinematics-aware inductive biases for improved action learning in articulated robot tasks.
SafeDPO reformulates safety alignment as closed-form objective, achieving strong safety-helpfulness trade-offs without auxiliary models.
SGF unifies negative guidance in safe generation via MMD potentials and control barrier analysis with time-critical guidance windows.
Generates minute-long high-resolution videos efficiently with linear attention and constant-memory KV cache for block autoregression.
ScaleCUA scales open-source computer use agents with cross-platform dataset and dual-loop data pipeline.
Proteina-Complexa unifies generative modeling and hallucination for atomistic binder design via pretraining on Teddymer and test-time optimization.
Analyzes scaling laws for shallow networks with feature learning via sparse estimation and matrix compression theory.
PRISM framework projects fMRI signals into structured text space for visual stimulus reconstruction with object-centric diffusion and attribute search modules.
SSPO achieves data efficiency in preference optimization by pseudo-labeling unpaired data using theoretically-grounded reward thresholds.
Extended logit matrices reveal low-rank structure of language models enabling linear generation from unrelated prompts.
Develops methods for LMs to ask informative questions and make decisions under uncertainty using Bayesian Experimental Design.
SimuHome introduces Matter protocol-grounded smart home simulator and 600-episode benchmark evaluating LLM agents on device control and workflow scheduling.
Proves length-generalizable softmax transformers with chain-of-thought and relative positional encoding are Turing-complete.
Speculative Actions accelerates agent systems by predicting and executing likely future actions in parallel.
Watermarks diffusion models losslessly via spherical mapping preserving Gaussian prior up to third-order moments.
Generates ultra-long videos by actively correcting self-generated errors through error-recycling fine-tuning.
Framework studying strategic control of social learning by algorithmic information mediators with theoretical analysis and LLM-based simulations.
Structured Flow Autoencoders integrate flow matching with graphical models for structured representation learning.
SwingArena evaluates LLMs on GitHub issue solving via adversarial framework modeling submitter-reviewer collaboration with retrieval-augmented code generation.
TabStruct benchmark evaluates tabular data generators on structural fidelity and conventional dimensions using global utility metric without ground-truth causal structures.
LoRA-Pre low-rank optimizer reduces momentum matrix memory via online linear learner decomposition while maintaining optimization performance.
ABOM performs task-free adaptive meta black-box optimization using online parameter adaptation without predefined task distributions.
Learns zero-shot RL representations via temporal difference latent prediction recovering successor factorization.
Temporal Sparse Autoencoders incorporate contrastive loss encouraging consistent feature activations over adjacent tokens to discover semantic concepts.
Studies temporal superposition in RNNs showing how memory demands affect representational geometry and RNNs learn different encoding strategies.
VIST3A stitches text-to-video models with 3D reconstruction systems and aligns them via reward finetuning for high-quality text-to-3D generation.
ScaleRL provides principled framework for predicting RL compute scaling in LLMs through 400,000 GPU-hour study.
Develops theory linking pre-training coverage to post-training success through model scaling and practical algorithms.
Polar Express computes polar decomposition with minimax-optimized update rules for efficient GPU-friendly training.
Uses persistent homology to characterize topological compression in LLM latent spaces induced by adversarial inputs.
Spacetime perspective views diffusion latent spaces as Fisher-Rao metric manifolds enabling efficient geodesic computation without simulation.
Compresses KV cache in reasoning models via thought-adaptive quantization and eviction achieving near-lossless accuracy.
VC-STaR mitigates visual hallucinations through contrastive VQA pairs for self-improving visual reasoning.
TileLang enables hardware-aware fused kernel programming with tile inference and recommendation achieving 5-6x speedup.
Shows tool-use enables state space models to achieve length generalization previously limited by fixed-size memory.
Proposes token-importance guided DPO with gradient attribution weighting and triplet loss for fine-grained LLM alignment.
TRACE reveals diffusion models encode hidden instance boundary priors and leverages them for unsupervised instance segmentation without dense annotations.
Proposes train-before-test approach showing model potential rankings transfer across benchmarks better than direct evaluation.
Proves transformers with unique-hard attention are exponentially more succinct than finite automata and LTL formulas but verification is EXPSPACE-complete.
Characterizes online learning with ranking feedback showing sublinear regret impossible in general, possible with variation bounds.
TROLL replaces PPO clip objective with differentiable trust region projection for more stable and efficient LLM reward fine-tuning.
Presents XFactor, first geometry-free self-supervised model for transferable novel view synthesis without 3D inductive biases.
TTSDS2 metric robustly correlates with human judgments for TTS evaluation across diverse speech domains maintaining >0.5 Spearman correlation.
UALM unified audio language model handles understanding, text-to-audio generation, and multimodal reasoning in single model with UALM-Reason for cross-modal generative reasoning.
Proposes CorreGen, generative framework for multi-view clustering under noisy correspondence using EM algorithm.
RealUID provides universal distillation for matching models without GANs, incorporating real data into one-step generator training.
CRV uses attribution graphs as execution traces to verify chain-of-thought reasoning with white-box mechanistic analysis of computation failures.
Veritas deepfake detector uses pattern-aware reasoning via MLLMs to achieve superior generalization across unseen forgery techniques and data domains.
Presents VibeVoice for zero-shot expressive long-form multi-speaker podcast generation using next-token diffusion.
Vid-LLM is a video-based 3D multimodal LLM that extracts geometric cues from videos without external 3D data for 3D scene understanding.
Proposes visual planning paradigm using purely visual representations for reasoning in spatially-grounded tasks.
VLMs employ position IDs as content-independent spatial indices to solve visual binding across object features.
WAFT replaces cost volumes with high-resolution warping for optical flow, ranking first on Spring, Sintel, and KITTI with 1.3-4.1x faster inference.
FAB enables adversaries to create compromised LLMs that exhibit dormant adversarial behaviors triggered only during downstream finetuning.
Creates first unified audio-visual embedding space for text, audio, and video with hierarchical fusion and prompt-awareness.
FALCON enables few-step flow-based sampling with accurate likelihoods for efficient Boltzmann distribution sampling.
WIMHF uses sparse autoencoders to extract human-interpretable features from preference data, enabling better understanding and curation of human feedback.
AuxDPO introduces auxiliary variables mitigating DPO misspecification and moving toward RLHF solutions.
Analyzes low-precision flash attention training failure caused by low-rank representations and biased BF16 rounding errors.
Introduces closed-loop benchmark evaluating generative world models on embodied task performance rather than visual quality.
WSM establishes theoretical connection between LR decay and model merging for improved LLM pre-training.