Loading [MathJax]/extensions/MathMenu.js
Kim Van Den Houten - IEEE Xplore Author Profile

Showing 1-2 of 2 results

Results

We study a highly complex scheduling problem that requires the generation and optimization of production schedules for a multi-product biomanufacturing system with continuous and batch processes. There are two main objectives here; makespan and lateness, which are combined into a cost function that is a weighted sum. An additional complexity comes from long horizons considered (up to a full year),...Show More
Policy gradient methods are successful for a wide range of reinforcement learning tasks. Traditionally, such methods utilize the score function as stochastic gradient estimator. We investigate the effect of replacing the score function with a measure-valued derivative within an on-policy actor-critic algorithm. The hypothesis is that measure-valued derivatives reduce the need for score function va...Show More