Soft Policy Optimization using Dual-Track Advantage Estimator | IEEE Conference Publication | IEEE Xplore