[Spotlight] Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Published in NeurIPS, 2024

Recommended citation: F. Kunstner, A. Milligan, R. Yadav, M. Schmidt, A. Bietti, Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models, NeurIPS, 2024.