I. Introduction
The demand for machine learning applications, especially in practice, has considerably increased in recent years. The demand is much greater than the supply that can be met, as the successful development of such applications requires developers with a deep understanding and experience in machine learning [1]. The range of applications reaches from smaller classification and regression tasks to the detection of anomalies and the optimized control of complex processes by intelligent agents. The latter has produced promising results in research in recent years. Examples are AlphaGo [2] or the execution of flight maneuvers with a helicopter which are hardly manageable by humans [3], [4]. These research results are strongly based on methods of reinforcement learning, which make it possible to develop systems that independently develop new strategies and solutions by optimizing themselves against a given reward signal. The learned results can be better than the solution strategies known to the developer and thus potentially outperform humans. The success of reinforcement learning algorithms depends strongly on the hyper parameters used as well as the design of the reward function [5] and is therefore particularly dependent on the experience of the development team. Despite the successes in research, the transfer into practice is rarely seen and does not take place today especially in smaller companies. In the context of the advancing automation in the industry, as well as in the strongly growing field of autonomous systems, there are many applications for intelligent agents. However, many of the potential use cases have a high degree of criticality and are likely to have high cost factors. Besides the complexity that the development of such systems entails, these application areas therefore impose increased demands on quality attributes and productivity when moving towards industrial product development.