1
|
Sun H, Shen L, Zhong Q, Ding L, Chen S, Sun J, Li J, Sun G, Tao D. AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks. Neural Netw 2024; 169:506-519. [PMID: 37944247 DOI: 10.1016/j.neunet.2023.10.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 09/25/2023] [Accepted: 10/26/2023] [Indexed: 11/12/2023]
Abstract
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a O(1/bT) convergence rate, which achieves linear speedup property with respect to mini-batch size b. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks and the synthetic task, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.
Collapse
Affiliation(s)
- Hao Sun
- School of Computer Science, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Li Shen
- JD.com, Beijing, 100000, China.
| | - Qihuang Zhong
- School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, China
| | | | - Shixiang Chen
- School of Mathematical Science, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Jingwei Sun
- School of Computer Science, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Jing Li
- School of Computer Science, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Guangzhong Sun
- School of Computer Science, University of Science and Technology of China, Hefei, 230026, Anhui, China
| | - Dacheng Tao
- School of Computer Science, University of Sydney, Sydney, 2006, New South Wales, Australia
| |
Collapse
|
2
|
Li Z, Xie Y, Zeng K, Xie S, Kumara BT. Adaptive sparsity-regularized deep dictionary learning based on lifted proximal operator machine. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
3
|
Liang N, Yang Z, Li Z, Han W. Incomplete multi-view clustering with incomplete graph-regularized orthogonal non-negative matrix factorization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03551-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
Chen X, Li Y, Ding S, Tan B, Jiang Y. A Novel Nonlinear Dictionary Learning Algorithm Based on Nonlinear-KSVD and Nonlinear-MOD. ARTIF INTELL 2022. [DOI: 10.1007/978-3-031-20503-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|