Adversarial Defense using Targeted Manifold Manipulation

Research output: Contribution to conferencePaper

Abstract

Adversarial attacks on deep models are often guaranteed to find a small and innocuous perturbation to easily alter class label of a test input. We use a novel Targeted Manifold Manipulation approach to direct the gradients from the genuine data manifold towards carefully planted trapdoors during such adversarial attacks. The trapdoors are assigned an additional class label (Trapclass) to make the attacks falling in them easily identifiable. Whilst low-perturbation budget attacks will necessarily end up in the trapdoors, high-perturbation budget attacks may escape but only end up far away from the data manifold. Since our manifold manipulation is enforced only locally, we show that such out-of-distribution data can be easily detected by noting the absence of trapdoors around them. Our detection algorithm avoids learning a separate model for attack detection and thus remain semantically aligned with the original classifier. Further, since we manipulate the adversarial distribution it avoids the fundamental difficulty associated with overlapping distributions of clean and attack samples for usual, unmanipulated models. We use six state-of-the-art adversarial attacks with four well-known image datasets to evaluate our proposed defense. Our results show that the proposed method can detect \sim99% attacks without significant drop in clean accuracy whilst also being robust to semantic-preserving, non-attack perturbations.
Original languageEnglish
Publication statusSubmitted - 22 Sept 2023

Keywords

  • adversarial defense
  • backdoor
  • deep learning

Fingerprint

Dive into the research topics of 'Adversarial Defense using Targeted Manifold Manipulation'. Together they form a unique fingerprint.

Cite this