I. Introduction
In the past several decades, the backpropagation (BP) algorithm has been used to train deep multilayer neural networks (DNNs) inspired by the architectural depth of the brain [1]. However, it is well known that if one trains DNNs with the BP algorithm from randomly initialized parameters, one typically will end up with models that have poor prediction performance [2]–[5].