In this work, we study the possibility of defending against "data-poisoning" attacks while learning a neural net. We focus on the supervised learning setup for a class of finite-sized depth-2 nets - which include the standard single filter convolutional nets. For this setup we attempt to learn the true label generating weights in the presence of a malicious oracle doing stochastic bounded and additive adversarial distortions on the true labels being accessed by the algorithm during training. For the non-gradient stochastic algorithm that we instantiate we prove (worst case nearly optimal) trade-offs among the magnitude of the adversarial attack, the accuracy, and the confidence achieved by the proposed algorithm. Additionally, our algorithm uses mini-batching and we keep track of how the mini-batch size affects the convergence.
Original language | Undefined |
---|
Number of pages | 21 |
---|
Publication status | Submitted - 2022 |
---|
Externally published | Yes |
---|
- cs.LG
- cs.IT
- math.IT
- stat.ML
- 90C15 68W40 68T05