1. Introduction
The deployment of efficient convolutional neural networks (CNNs) enabled immense progress [23], [29], [30], [31], [4], [9], [41] in vision detectors for edge devices, in which they consistently reduce parameters and speed counts for improving accuracy. However, these metrics are not correlated well with the efficiency of the models in terms of energy. Evaluation metrics, such as parameters, do not take into account the energy cost of models, resulting in a nontrivial effect on the energy cost of detectors. Compared with the same architecture, the parameters of models are positively correlated with their energy cost (shown in Table 2). However, in the case of equal model parameters, their energy consumption may be negatively correlated or even irrelevant to the model parameters (shown in Table 1). Considering that various activation functions, convolution operators, and feature fuse structures may not increase model parameters, but generate more energy costs. Similarly, the speed count is also not well correlated with energy, as it can be optimized by the degree of parallelism. These disconnections will leave customized efficient detectors unavailable, when they are deployed under severe energy constraints like the always-on surveillance cameras.