Non-Convex Federated Optimization under Cost-Aware Client Selection
Xiaowen Jiang, Anton Rodomanov, Sebastian U Stich
We propose a composite gradient method with SAGA and recursive gradient estimator for non-convex federated optimization that can exploit second-order similarity and achieve high communication and local computation efficiency.
Abstract
Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, while others need to periodically communicate with all clients or use a hybrid scheme that combines both strategies. However, existing metrics for comparing optimization methods typically do not distinguish between these strategies, which often incur different communication costs in practice. To address this disparity, we introduce a simple and natural model of federated optimization that quantifies communication and local computation complexities. This new model allows for several commonly used client-selection strategies and explicitly associates each with a distinct cost. Within this setting, we propose a new algorithm that achieves the best-known communication and local complexities among existing federated optimization methods for non-convex optimization. This algorithm is based on the inexact composite gradient method with a carefully constructed gradient estimator and a special procedure for solving the auxiliary subproblem at each iteration. The gradient estimator is based on SAGA, a popular variance-reduced gradient estimator. We first derive a new variance bound for it, showing that SAGA can exploit functional similarity. We then introduce the Recursive-Gradient technique as a general way to potentially improve the error bound of a given conditionally unbiased gradient estimator, including both SAGA and SVRG. By applying this technique to SAGA, we obtain a new estimator, RG-SAGA, which has an improved error bound compared to the original one.
Develops efficient federated optimization algorithm with cost-aware client selection achieving best communication and local complexity.
- Introduces new model for comparing distributed optimization algorithms with non-uniform client-selection costs
- Develops algorithm based on inexact composite gradient method with recursive gradient estimator
- Derives new variance bound for SAGA estimator exploiting functional similarity
- Introduces Recursive-Gradient technique to improve error bounds for variance-reduced gradient estimators
- SAGA
- SVRG
- Composite gradient method
- Variance reduction
Authors did not state explicit limitations.
Authors did not state explicit future directions.
Author keywords
- Client Sampling
- SAGA
- Second-order Similarity
- Composite Gradient Method
- Variance Reduction
Related orals
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Shows decentralized learning with single global merging achieves convergence rates matching parallel SGD under data heterogeneity.
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Analyzes phase retrieval learning dynamics with anisotropic data, deriving explicit scaling laws and three-phase trajectories.
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Representer theorem for Hawkes processes shows dual coefficients are analytically fixed to unity via penalized least squares.
Quantitative Bounds for Length Generalization in Transformers
Quantitative bounds show training length required for length generalization depends on periodicity, locality, alphabet size, and model norms.
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
EBTs frame System 2 thinking as energy minimization enabling inference-time reasoning emergence across modalities.