I. Introduction
In recent years, deep learning (DL) has been widely adopted in many application domains, such as computer vision [1], speech recognition [2], and natural language processing [3]. Like many traditional software systems, DL models are also highly configurable via a set of configuration options for hyperparameters (e.g., the batch size and the dropout) and neural architectures (e.g., the number of layers). To search for the optimal configuration of a DL model that satisfies specific requirements, developers usually run (e.g., via automated machine learning tools) a large number of training jobs to explore diverse configurations.