Conferences >2023 International Joint Conf...

QVDDPG: QV Learning with Balanced Constraint in Actor-Critic Framework

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Actor-critic framework has achieved tremendous success in a great many of decision-making scenarios. Nevertheless, when updating the value of new states and actions in th...Show More

Metadata

Abstract:

Actor-critic framework has achieved tremendous success in a great many of decision-making scenarios. Nevertheless, when updating the value of new states and actions in the long-term scene, these methods suffer from misestimate problem and gradient variance problem, significantly reducing convergence speed and robustness of the policy. These problems severely limit the application scope of these methods. In this paper, we first proposed QVDDPG, a deep RL algorithm based on the iterative target value update process. The QV learning method alleviates the problem of misestimate by making use of the guidance of Q value and the fast convergence of V value, thus accelerating the convergence speed. In addition, the actor utilizes a constrained balanced gradient and establishes a hidden state for the continuous action space network for the sake of robustness of the model. We give the update relation among the value functions and the constraint conditions of gradient estimation. We measure our method on the PyBullet and achieved state-of-the-art performance. Moreover, we demonstrate that, our method has higher robustness and convergence speed across different tasks compared to other algorithms.

Published in: 2023 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 18-23 June 2023

Date Added to IEEE Xplore: 02 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/IJCNN54540.2023.10191805

Conference Location: Gold Coast, Australia

Funding Agency:

References is not available for this document.

Contents

I. Introduction

Deep Reinforcement learning (DRL) algorithms have been applied in quiet a lot of challenging fields, from intelligent chess, card games [1], knowledge reasoning [2], recommendation systems [3], [4], and causal reasoning [5], to robot technology [6]. Still, it fails to solve practical problems well due to the slow rate of policy convergence and low robustness. In continuous task control of various robots, if the algorithm converges to a good performance level earlier, robots' working efficiency in complex environments will be improved. Meanwhile, if the algorithm is not robust, its fluctuation may cause huge losses [7]. Unfortunately, existing models cannot maintain high convergence speed and robustness of the algorithm under the premise of misestimate and gradient variance, resulting in suboptimal policy updates. Both of these challenges will affect the performance of reinforcement learning algorithms.

Select All

D. Zha, J. Xie, W. Ma, S. Zhang, X. Lian, X. Hu, et al., "Douzero: Mastering doudizhu with self-play deep reinforcement learning", International Conference on Machine Learning, pp. 12333-12344, 2021.

QVDDPG: QV Learning with Balanced Constraint in Actor-Critic Framework

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?