I. Introduction
Training neural networks involves the intricate process of optimizing the model's parameters to minimize a predefined loss function, typically accomplished through gradient descent-based algorithms like backpropagation [1]. While this method has delivered impressive results, the manual design of the optimal neural network architecture remains a formidable challenge. Manually crafted architectures may fail to fully explore the vast design space, leading to suboptimal solutions.