Convex optimization on measures with over-parameterized gradient descent

Abstract

Minimizing a convex function of a signed measure is a typical problem in signal processing or machine learning arising e.g., in gridless spikes deconvolution or two-layer neural networks training. This is a difficult problem: convex algorithms that are traditionally studied often exhibit undesirable behavior (high complexity or low precision). In this talk, I will present an alternative non-convex approach: discretize the measure into “particles” and run gradient descent on their positions and weights. This is simple to implement and often efficient in practice, but the theoretical analysis is challenging due to non-convexity. I will first present a general consistency result: when properly initialized, this method converges to global minimizers as the number of particles goes to infinity. I will then present quantitative convergence results for problems with a sparsity-inducing regularization. The analysis involves tools and ideas from unbalanced optimal transport and Wasserstein gradient flows theories. This talk is based on joint work with Francis Bach.

References

https://arxiv.org/abs/1805.09545
https://arxiv.org/abs/1907.10300