Global Convergence of SGD On Two Layer Neural Nets

Pulkit Gopalani, Anirbit Mukherjee*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this note, we consider appropriately regularized ℓ2−empirical risk of depth 2 nets with any number of gates and show bounds on how the empirical loss evolves for SGD iterates on it – for arbitrary data and if the activation is adequately smooth and bounded like sigmoid and tanh. This in turn leads to a proof of global convergence of SGD for a special class of initializations. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized loss functions on constant-sized neural nets which are “Villani functions” and thus be able to build on recent progress with analyzing SGD on such objectives. Most critically the amount of regularization required for our analysis is independent of the size of the net.
Original languageEnglish
Number of pages22
JournalInformation and Inference: a Journal of the IMA
DOIs
Publication statusPublished - 20 Jan 2025

Keywords

  • deep-learning
  • stochastic optimization
  • stochastic differential equations
  • functional analysis

Research Beacons, Institutes and Platforms

  • Institute for Data Science and AI
  • Christabel Pankhurst Institute

Fingerprint

Dive into the research topics of 'Global Convergence of SGD On Two Layer Neural Nets'. Together they form a unique fingerprint.

Cite this