Abstract
We propose a novel adaptive critic learning algorithm for a continuous-time nonlinear system subject to excitation and weight constraints. The algorithm is able to learn the optimal control in real-time under only finite excitation without requiring the a priori knowledge of the system model, i.e. the Hamilton-Jacobi-Bellman (HJB) equation is approximately solved online by the adaptive critic learning of a nonlinear Q-function. The main contribution of this paper is twofold: First, we present an optimisation-based approach to the derivation of a weight-error-driven adaptive law that guarantees exponential convergence of the critic weight. Such formulation enables a new P-projection operator to enhance the convergence property, i.e. the weight estimate always stays in a bounded convex set that contains the true weight. Second, we adopt a new measure to build the information matrix that stores its richness over incoming data such that the standard persistent excitation (PE) condition is relaxed to a finite excitation (FE) condition. In this way, the convergence of the critic weight is guaranteed without persistently injecting exploration noise. We show that the method is model-free and can achieve semi-global stability. A numerical example demonstrates the effectiveness of the theoretical result.
Original language | English |
---|---|
Title of host publication | 2023 62nd IEEE Conference on Decision and Control (CDC), Singapore |
DOIs | |
Publication status | Published - 13 Dec 2023 |
Keywords
- adaptive optimal control
- Adaptive critic
- projection operator
- finite excitation
- Q-learning