Abstract:
Actor-critic framework has achieved tremendous success in a great many of decision-making scenarios. Nevertheless, when updating the value of new states and actions in th...Show MoreMetadata
Abstract:
Actor-critic framework has achieved tremendous success in a great many of decision-making scenarios. Nevertheless, when updating the value of new states and actions in the long-term scene, these methods suffer from misestimate problem and gradient variance problem, significantly reducing convergence speed and robustness of the policy. These problems severely limit the application scope of these methods. In this paper, we first proposed QVDDPG, a deep RL algorithm based on the iterative target value update process. The QV learning method alleviates the problem of misestimate by making use of the guidance of Q value and the fast convergence of V value, thus accelerating the convergence speed. In addition, the actor utilizes a constrained balanced gradient and establishes a hidden state for the continuous action space network for the sake of robustness of the model. We give the update relation among the value functions and the constraint conditions of gradient estimation. We measure our method on the PyBullet and achieved state-of-the-art performance. Moreover, we demonstrate that, our method has higher robustness and convergence speed across different tasks compared to other algorithms.
Date of Conference: 18-23 June 2023
Date Added to IEEE Xplore: 02 August 2023
ISBN Information:
ISSN Information:
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Actor-critic Framework ,
- Value Function ,
- Actual Values ,
- Convergence Rate ,
- Target Value ,
- State Value ,
- Continuous Action ,
- Gradient Approximation ,
- Hidden State ,
- Deep Reinforcement Learning ,
- Guide For Use ,
- Updated Values ,
- Update Process ,
- Iterative Update ,
- Robust Policy ,
- Deep Reinforcement Learning Algorithm ,
- Neural Network ,
- Deep Learning ,
- Learning Algorithms ,
- Expected Value ,
- Policy Gradient ,
- Deterministic Policy Gradient ,
- Policy Gradient Method ,
- Policy Network ,
- Policy Gradient Algorithm ,
- Reinforcement Learning Algorithm ,
- Markov Decision Process ,
- Deterministic Policy ,
- Temporal Difference Learning ,
- Policy Update
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Actor-critic Framework ,
- Value Function ,
- Actual Values ,
- Convergence Rate ,
- Target Value ,
- State Value ,
- Continuous Action ,
- Gradient Approximation ,
- Hidden State ,
- Deep Reinforcement Learning ,
- Guide For Use ,
- Updated Values ,
- Update Process ,
- Iterative Update ,
- Robust Policy ,
- Deep Reinforcement Learning Algorithm ,
- Neural Network ,
- Deep Learning ,
- Learning Algorithms ,
- Expected Value ,
- Policy Gradient ,
- Deterministic Policy Gradient ,
- Policy Gradient Method ,
- Policy Network ,
- Policy Gradient Algorithm ,
- Reinforcement Learning Algorithm ,
- Markov Decision Process ,
- Deterministic Policy ,
- Temporal Difference Learning ,
- Policy Update
- Author Keywords