Abstract
We investigate a family of regression problems in a semisupervised setting. The task is to assign real-valued labels to a set of n sample points provided a small training subset of N labeled points. A goal of semisupervised learning is to take advantage of the (geometric) structure provided by the large number of unlabeled data when assigning labels. We consider random geometric graphs, with connection radius epsilon(n), to represent the geometry of the data set. Functionals which model the task reward the regularity of the estimator function and impose or reward the agreement with the training data. Here we consider discrete p-Laplacian regularization. We investigate asymptotic behavior when the number of unlabeled points increases while the number of training points remains fixed. A delicate interplay between the regularizing nature of the functionals and the nonlocality inherent to the graph constructions is uncovered. Rigorous, almost optimal, ranges on the scaling of epsilon(n) for asymptotic consistency are obtained and in these admissible ranges it is shown that minimizers of the discrete functionals converge uniformly to the desired continuum limit. Furthermore, we discover that, for the standard model, there is a restrictive upper bound on how quickly epsilon(n) must converge to zero as n converges to infinity. A new model is introduced, which is as simple as the original model, but overcomes this restriction.
Original language | English |
---|---|
Pages (from-to) | 2085-2120 |
Journal | SIAM Journal on Mathematical Analysis |
Volume | 51 |
Issue number | 3 |
Early online date | 23 May 2019 |
DOIs | |
Publication status | Published - 23 May 2019 |