AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Official implementation of the accepted paper.

Feature	AdaRankGrad	GaLore	LoRA
Weights	( nm )	( nm )	( nm + nr + mr )
Optim States (r_{adap} < r)	( n r_{adap} + 2 m r_{adap} )	( n r + 2 m r )	( 2 n r + 2 m r )
Multi-Subspace	✅	✅	❌
Adaptive-Subspace-Dimension	✅	❌	❌
Adaptive-Subspace-Updates	✅	❌	❌
Pre-Training	✅	✅	❌
Fine-Tuning	✅	✅	✅

Link to the paper: Openreview

Authors:

Jonathan Svirsky
Yehonathan Refael
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum

Citing:

If you are using this code please cite our paper:

@inproceedings{
refael2025adarankgrad,
title={AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient {LLM}s Training and Fine-Tuning},
author={Yehonathan Refael and Jonathan Svirsky and Boris Shustin and Wasim Huleihel and Ofir Lindenbaum},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=LvNROciCne}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Authors:

Citing:

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Authors:

Citing: