TY - JOUR
T1 - Model-based Data Efficient Reinforcement Learning for Active Pantograph Control in High-Speed Railways
AU - Wang, Hui
AU - Liu, Zhigang
AU - Wang, Xufan
AU - Meng, Xiangyu
AU - Wu, Yanbo
AU - Han, Zhiwei
PY - 2023/8/18
Y1 - 2023/8/18
N2 - The active pantograph is a promising technology to suppress contact force fluctuation in pantograph catenary systems (PCS). Recently, the rapid development of reinforcement learning techniques has dramatically facilitated complex system controllers’ design. However, the low data efficiency problem is fatal because data collection is costly. In this paper, We propose the Ensemble Q-functions Model-based Reinforcement Learning algorithm (EQ-MBRL) to achieve data-efficient reinforcement learning. First, we introduce an ensemble probabilistic neural network to estimate the distribution and uncertainty of the dynamics model and adopt multi-step loss to constraint accumulation error in the long-length model rollout. Second, we employ a short-term rollout of the model to trade off the ease of data generation and the error of the model-generated data. Finally, we propose ensemble Q functions and in-target minimization techniques to help stabilize the training process of value functions and improve the accuracy of value estimation. In addition, we discussed the appropriate model-based rollout length and explored the performance of network update rates with different strategies. The experimental results demonstrate that the proposed approach outperforms compared algorithms and delivers a state-of-the-art performance on the PCS benchmark. The controller learned robust motion patterns using only 50K collected transitions, which was more than ten times faster than compared baseline.
AB - The active pantograph is a promising technology to suppress contact force fluctuation in pantograph catenary systems (PCS). Recently, the rapid development of reinforcement learning techniques has dramatically facilitated complex system controllers’ design. However, the low data efficiency problem is fatal because data collection is costly. In this paper, We propose the Ensemble Q-functions Model-based Reinforcement Learning algorithm (EQ-MBRL) to achieve data-efficient reinforcement learning. First, we introduce an ensemble probabilistic neural network to estimate the distribution and uncertainty of the dynamics model and adopt multi-step loss to constraint accumulation error in the long-length model rollout. Second, we employ a short-term rollout of the model to trade off the ease of data generation and the error of the model-generated data. Finally, we propose ensemble Q functions and in-target minimization techniques to help stabilize the training process of value functions and improve the accuracy of value estimation. In addition, we discussed the appropriate model-based rollout length and explored the performance of network update rates with different strategies. The experimental results demonstrate that the proposed approach outperforms compared algorithms and delivers a state-of-the-art performance on the PCS benchmark. The controller learned robust motion patterns using only 50K collected transitions, which was more than ten times faster than compared baseline.
KW - Adaptation models
KW - Data models
KW - Finite element analysis
KW - Force
KW - High-speed railways
KW - Predictive models
KW - Training
KW - Uncertainty
KW - active pantograph control
KW - contact force
KW - model-based reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85168714833&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/9f4beee4-a4cb-3ad3-80da-0481a0326e08/
U2 - 10.1109/TTE.2023.3304018
DO - 10.1109/TTE.2023.3304018
M3 - Article
SN - 2332-7782
JO - IEEE Transactions on Transportation Electrification
JF - IEEE Transactions on Transportation Electrification
ER -