|
Claudia Giesbert

Oberseminar Mathematics of Deep Learning: Prof. Stephan Wojtowytsch (Texas A&M University, USA) via ZOOM: Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

Friday, 13.01.2023 07:15

Mathematik und Informatik

We study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as dimension and sqrt(dimension) respectively. We furthermore show that the weight decay regularizer grows exponentially in d if the label 1 is imposed on a ball of radius epsilon>0 rather than just at the origin. For comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality. As applications, we discuss approximation rates using mollificiation and the empirical study of optimization algorithms.



Angelegt am Thursday, 12.01.2023 13:25 von Claudia Giesbert
Geändert am Thursday, 12.01.2023 13:25 von Claudia Giesbert
[Edit | Vorlage]

Angewandte Mathematik Münster
Oberseminar Angewandte Mathematik