An Adaptive Critic Learning Approach for Nonlinear Optimal Control Subject to Excitation and Weight Constraints

Anthony Siming Chen, Guido Herrmann

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

14 Downloads (Pure)

Abstract

We propose a novel adaptive critic learning algorithm for a continuous-time nonlinear system subject to excitation and weight constraints. The algorithm is able to learn the optimal control in real-time under only finite excitation without requiring the a priori knowledge of the system model, i.e. the Hamilton-Jacobi-Bellman (HJB) equation is approximately solved online by the adaptive critic learning of a nonlinear Q-function. The main contribution of this paper is twofold: First, we present an optimisation-based approach to the derivation of a weight-error-driven adaptive law that guarantees exponential convergence of the critic weight. Such formulation enables a new P-projection operator to enhance the convergence property, i.e. the weight estimate always stays in a bounded convex set that contains the true weight. Second, we adopt a new measure to build the information matrix that stores its richness over incoming data such that the standard persistent excitation (PE) condition is relaxed to a finite excitation (FE) condition. In this way, the convergence of the critic weight is guaranteed without persistently injecting exploration noise. We show that the method is model-free and can achieve semi-global stability. A numerical example demonstrates the effectiveness of the theoretical result.
Original languageEnglish
Title of host publication2023 62nd IEEE Conference on Decision and Control (CDC), Singapore
DOIs
Publication statusPublished - 13 Dec 2023

Keywords

  • adaptive optimal control
  • Adaptive critic
  • projection operator
  • finite excitation
  • Q-learning

Fingerprint

Dive into the research topics of 'An Adaptive Critic Learning Approach for Nonlinear Optimal Control Subject to Excitation and Weight Constraints'. Together they form a unique fingerprint.

Cite this