Variational value learning in advantage actor-critic reinforcement learning | IEEE Conference Publication | IEEE Xplore