Introduction
A driving pattern is typically defined as the driving cycle of a vehicle in a particular environment [1], [2]. Since the current driving pattern has a great impact on the energy management strategy of a hybrid electric vehicle (HEV) [3], [4], it is efficient to use the prior knowledge of the driving cycle to achieve the real time driving pattern recognition (RTDPR) and enhance the control performance of the HEV [5], [6]. There are many researches on the RTDPR [2], [7]–[10]. The conventional way is to manually extract features from the historical speed data to characterize the driving patterns [2]. Then the classical machine learning models like k-means [7], hidden Markov models [8], fuzzy c-means [9], and their variants [10] are fully utilized to classify the extracted features into different categories. Therefore, the quality of the feature extraction algorithm plays a great impact on the classification accuracy. However, those manually extracted features usually include average speeds, average accelerations and other features which are directly calculated using physical models [11], while other complex and high level features are hard to represent. In practice, those low level features are unable to effectively characterize the complex driving patterns. Additionally, to reduce time cost of RTDPR, a limited amount e.g., 16 features are selected to characterize the driving patterns [12]. Based on the above analysis, it can be concluded that the recognition accuracy of the conventional methods is significantly affected by selected features. Recently with the development of deep learning and its strong classification ability [13]–[15], the convolution neuron network (CNN) has been wildly used in the pattern recognition fields [16]–[18], and achieved good performances. The CNN can achieve an end-to-end recognition without feature extraction but still has not been widely applied in RTDPR, partially due to the lack of magnanimous training samples. Motivated by the CNN, we do not manually generate the feature vectors from the historical speed data to build the model. Instead, the model learns to extract the features itself from the datasets [19]. During the training process, the model can learn to select the most representative features and their amount automatically. The simulation results indicate that the features selected automatically by the models are more representative than those that are manually designed. The standard CNN is a nonlinear model with typically thousands of parameters, which may easily get overfitting when the training samples are not sufficient [20]. The most parameters concentrate on the fully-connected layers which hold much redundancy. To solve the problem, we design an automated feature extractor that retains the former part of the CNN and removes the fully-connected layer. Then the kernel PCA (KPCA) layer is added to further supply features, thus the redundancy is removed and classification is simplified. Additionally, we have performed linear shift on the speed data to expand the dataset, which also proves to be very effective to avoid overfitting.
In this work, we firstly collect the training samples from the historical speed data by a sliding window. The size and step of the window are adjusted in the training process. Secondly, we transform the training samples to the two-dimension dataset so that the CNN based model can effectively deal with the speed information. Thirdly, the two-dimension dataset are divided into batches to fit the feature extractor. Finally, the extracted features are utilized for RTDPR. The specific contributions of this paper are as follows: (1) We have improved the generalization of the standard CNN for small dataset by adding the KPCA layer. (2) We have achieved an end-to-end strategy for RTDPR instead of manually designing features. (3) The historical speed sequence is transformed to two-dimension to extract spatial features. (4) We have achieved the state-of-the-art accuracy for RTDPR.
The structure of this paper is as follows: the details of the CNN + KPCA architecture are described in section II. Then our model based on CNN + KPCA is reported in section III. Section IV presents the applications on four typical patterns in the congested urban, flowing urban, subway and high way and in real environment. The results are compared with that of other typical classifiers. Finally, section V gives the conclusions of this paper.
The CNN + KPCA Architecture
A. The Standard CNN Classifier
The CNN model is a complex nonlinear function that maps the input samples into the corresponding driving patterns. The overall structure of CNN is described in Fig. 1, which includes one input layer, the complex middle layers and one output layer. The input layer of the CNN deals with the two-dimension samples. The middle layers include the convolution layers and a fully-connected layer. Within the convolution layer, the convolution operation is performed, followed by the max-pooling operation immediately. The outputs of the last convolution layer are then flattened to one-dimension as the inputs of the fully connected layer for further nonlinearization. In the output layer, there contain four neurons that delegate different driving patterns. The details of the calculation process are described as follows.
Provided that we have an \begin{align*} \boldsymbol {x}=\{x_{0,0}\ldots x_{0,n-1};x_{1,0}\ldots x_{1,n-1}; x_{n-1,0}\ldots x_{n-1,n-1}\} \\ \tag{1}\end{align*}
\begin{align*} \boldsymbol {w}\!=\! \{w_{0,0}\ldots w_{0,m-1};w_{1,0}\ldots w_{1,m-1}; w_{m-1,0}\ldots w_{m-1,m-1}\}.\!\!\!\!\! \\ \tag{2}\end{align*}
\begin{equation*} C^{1}_{i,j,k}=f\left({b_{k}+\sum \limits _{a_{1}=0}^{m-1}\sum \limits _{a_{2}=0}^{m-1}w_{a_{1}a_{2}}x_{i+a_{1},j+a_{2}}}\right) \tag{3}\end{equation*}
\begin{equation*} f(x)=max(0,x).\tag{4}\end{equation*}
The max-pooling operation prepares the condensed feature maps from the former part of the convolution layer. Then those feature maps stack up and comprise the latter part of the convolution layer. For instance, each neuron in the pooling operation summarizes a maximum activation in a region of \begin{equation*} M^{1}_{i,j,k}=\max \{C^{1}_{ei+\sigma _{1},ej+\sigma _{2},k}~|~0\leq \sigma _{1},\sigma _{2} < e\}. \tag{5}\end{equation*}
The second convolution layer is obtained by performing the convolution operation
Finally, the fully connected layer connects every neuron from the flattened max-pooling layer to every one of 4 output neurons [21]. Assume that there are \begin{align*} F_{i}=&f\left({\varphi ^{1}_{i}+\sum ^{h-1}_{j=0}\theta ^{1}_{ji}M^{2}_{j}}\right),\quad i\in \bigg[0,h\bigg)\tag{6}\\ Y_{i}=&s\left({\varphi ^{2}_{i}+\sum ^{h-1}_{j=0}\theta ^{2}_{ji}F_{i}}\right),\quad i\in \bigg[0,4\bigg) \tag{7}\end{align*}
\begin{equation*} s(z)_{j}=\frac {e^{z_{j}}}{\sum ^{3}_{i=0} e^{z_{i}}},\quad j\in [0,4) \tag{8}\end{equation*}
B. The CNN + KPCA Feature Extractor
As shown in Fig. 2, the number of the parameters in the first convolution layer is \begin{equation*} K_{ij}=k(M^{2}_{i},M^{2}_{j}) \tag{9}\end{equation*}
\begin{equation*} \overline {K}=K-1_{N}K-K1_{N}+1_{N}K1_{N} \tag{10}\end{equation*}
\begin{equation*} a_{i}\leftarrow \frac {a_{i}}{\sqrt {\lambda _{i}}} \tag{11}\end{equation*}
\begin{equation*} E=\frac {\sum _{i=1}^{l}\lambda _{i}}{\sum _{i=1}^{n}\lambda _{i}} \tag{12}\end{equation*}
\begin{equation*} \boldsymbol {M^{2}_{kpca}}= \overline {K}\boldsymbol {\alpha } \tag{13}\end{equation*}
Provided that we have a speed sequence of a driving vehicle with an unknown pattern, our goal is to recognize the driving pattern. The CNN + KPCA model is applied as a classifier to provide the probability distribution of the driving patterns for each input sequence, which is illustrated in Fig. 3. Each neuron of the output layer represents a driving pattern. The driving pattern with the maximum probability is the final recognized result.
Model Building
Since the KPCA fitting requires to maintain the projection space unchanged, the model building will include two separate processes. In the first stage, we train the standard CNN model with stochastic gradient descent methods. After the parameters of the CNN are optimized, we add the KPCA layer between the convolution layer and the fully-connected layer of the CNN. And in the second stage, we fine-tune the fully-connected layer with the convolution layer frozen. The parameters of the fully-connected layer are updated iteratively by gradient descent with the whole training batch. The computation is described in (14)–(20) in detail.
A. Typical CNN Model Building
We use the categorical cross-entropy in the loss function for the multi-class classification problem [23], which is detailed in (14).\begin{equation*} L(\boldsymbol {z})=-\sum _{i=0}^{3}G_{i}log(Y_{i}) \tag{14}\end{equation*}
\begin{equation*} \boldsymbol {g}_{\boldsymbol {z},t}=\nabla _{\boldsymbol {z}}L(\boldsymbol {z}_{t}) \tag{15}\end{equation*}
\begin{align*} \boldsymbol {m}_{\boldsymbol {z},t}=&\beta _{1}\boldsymbol {m}_{\boldsymbol {z},t-1}+(1-\beta _{1})\boldsymbol {g}_{\boldsymbol {z},t}\tag{16}\\ \boldsymbol {v}_{\boldsymbol {z},t}=&\beta _{2}\boldsymbol {v}_{\boldsymbol {z},t-1}+(1-\beta _{2})\boldsymbol {g}_{\boldsymbol {z},t}^{2} \tag{17}\end{align*}
\begin{align*} \overline {\boldsymbol {m}}_{\boldsymbol {z},t}=&\frac {\boldsymbol {m}_{\boldsymbol {z},t}}{1-\beta _{1}}\tag{18}\\ \overline {\boldsymbol {v}}_{\boldsymbol {z},t}=&\frac {\boldsymbol {v}_{\boldsymbol {z},t}}{1-\beta _{2}} \tag{19}\end{align*}
\begin{equation*} \boldsymbol {z}_{t+1}=\boldsymbol {z}_{t}-\alpha \frac {\overline {\boldsymbol {m}}_{\boldsymbol {z},t}}{\sqrt {\overline {\boldsymbol {v}}_{\boldsymbol {z},t}+\epsilon }} \tag{20}\end{equation*}
It is worthy to mention that the learning process in the two different training phases is very similar. There only exist slight differences of the objective parameters.
B. CNN + KPCA Model Building
After we have built the standard CNN, we extract the convolution layer and combine it with the KPCA layer as the feature extractor. Afterwards, we obtain all the features from the dataset which are used to develop the classifier. The training process is the same as the standard CNN described in (14)–(20), leaving a different input and the objective parameters to optimize. The feature extractor remains the same in this developing phrase, and the parameters in (21) are optimized following the way described in (14)–(20).\begin{equation*} \boldsymbol {z}=[\boldsymbol {\theta ^{1}},\boldsymbol {\theta ^{2}},\boldsymbol {\varphi ^{1}},\boldsymbol {\varphi ^{2}}] \tag{21}\end{equation*}
Case Study
A. Typical Driving Pattern
The speed-time sequence under different driving conditions is sampled to implement the driving pattern recognition. The Environmental Protection Agency (EPA) has classified four typical driving patterns in the real world, which include congested urban roads, flowing urban roads, subway and highway. The corresponding driving conditions are Manhattan bus drive cycle (MBDC), EPA urban dynamometer driving schedule (UDDS), West Virginia suburban driving schedule (WVUSUB) and US EPA highway fuel economy certification test (HWFET), which are labeled from 1 to 4 respectively indicating 4 different groups. The characteristics of the four driving conditions are described in Table 1, and the speed-time sequences are shown in Fig. 4.
B. The Dataset Process
To expand the dataset, a sliding window is defined as shown in Fig. 5, where the
In the one-dimension sequence, only the speed information is taken into consideration. To exploit the spatial structure of the speed distribution, we take a close look at the sequences on the pixel-wise level. A two-dimension array is used to reconstruct the sequence as shown in Fig. 5, where the 1s represent the corresponding pixels that contain the speed information in the 1-dimension sequence, and the 0s represent the corresponding pixels without the speed information in the 1-dimension sequence. The resolution of the 2-dimension data is defined by
C. Hyper Parameters
Table 2 gives the hyper parameters of the typical benchmark models and the proposed CNN + KPCA model, respectively, where
D. Results Analysis
We equally divide the dataset into training and test datasets to train and test the CNN model separately. Generally, the larger the dataset, the stronger the generalization of the model will be. Although our dataset is relatively small, we have successfully avoided overfitting by adding the KPCA layer to the architecture and achieved state-of-the-art results. The accuracy reaches 100% on the training set and 97.40% on the test set, which outperforms the other models based on different machine learning methods. The simulation was implemented at TensorFlow, running on a laptop with Intel Core i5 @ 2.3GHz and 8GB RAM. The training and testing classification results are illustrated in FIGURE 6, where only a small number of samples in class 2 (UDDS) and class 3 (WVUSUB) are misidentified on the testing set.
Knowing how the classifier performs on individual classes is important as it helps to refine the system design. A receiver operating characteristic (ROC), or simply ROC curve is plotted from the confusion matrix [26] to assess the performance of the classifier on the individual classes. By computing the area under the ROC curve denoted by AUC, the quality of the classifier is comprehensively evaluated. Fig. 7 shows the ROC curve of the CNN + KPCA model on the testing set, where the AUC of classes 1 and 4 are both 1, which means the model has outstanding performance on those classes. The micro-average and macro-average ROC curve are calculated to evaluate the generality on the four classes.
To visualize the effectiveness of the KPCA layer, the first two components of the CNN feature extractor are selected to form the scatter plot on both training and testing sets as shown in Fig. 8. In (a) and (c), the space is formed by the first two dimensions of the originally extracted features by CNN on the training and testing sets, respectively. In (b) and (d), the space is formed by the first two components of the projected features by KPCA, where a projection of the data makes features linearly separable and this help simplify the post-fully-connected layer and improve classification accuracy.
The first two components of the extracted features by CNN with or without KPCA projection. Training set: (a) Original space (b) Projection by KPCA; Testing set: (c) Original space (d) Projection by KPCA.
E. Comparison With Other Classifiers
To evaluate the necessity and effectiveness of the KPCA strategy on the CNN feature extractor, two sets of control strategies, the standard CNN strategy and the CNN combined with PCA strategy are performed on the same dataset, respectively.
1) Typical CNN
The simplified typical LeNet-5 with only one fully-connected layer (128 hidden neurons) and dropout strategy (0.5) was selected as the classifier for real time driving pattern recognition. The structure of the standard LeNet-5 was slightly adjusted to fit in the dataset. The training and testing results on the regular dataset are illustrated in Fig. 9, where the accuracy rate are 100% and 79.78%, respectively. Although the classifier on the training dataset achieved 100% correct rate, the testing accuracy fell far behind, which means the classifier lacked the generalization ability to handle the data point out of the training dataset, i.e. the overfitting occurred. The ROC curve on the test dataset is shown in Fig. 10, where the standard CNN classifier easily got confused between classes 2 and 3, in addition, the average accuracy rate on the test dataset is much lower than that of the proposed strategy.
2) CNN + PCA
The CNN and PCA based automated feature extractor was also evaluated for real time driving pattern recognition. The structure of the CNN part in the extractor was the same as the proposed extractor. In the reduction part, we apply PCA to replacing the KPCA to avoid overfitting. The classification results are illustrated in Fig. 11 and Fig. 12. The accuracy are 100% and 94.34%, respectively, where overfitting is overcame by applying the PCA strategy. The correct rate on the testing dataset is a little lower compared with the proposed strategy as KPCA is better in extracting nonlinear features.
The two control strategies have proved that the standard CNN will easily suffer from overfitting when the training dataset is not large enough. By slightly adjusting the structure of CNN and extracting effective neurons of the fully-connected layer, overfitting would be effectively reduced. Table 3 and Fig. 13 have summarized the metrics of the three models, where there is a big gap between the training and testing accuracy in the standard CNN strategy due to the numerous parameters. In addition, the AUC of the standard CNN model is relatively lower compared with the proposed strategies, which further prove the necessity and effectiveness of the KPCA on the CNN based extractor.
To prove the superiority of the proposed classifier based on automated feature extraction over the classical classifiers based on manually designed features, a variety of the traditional classifiers were trained and tested on the same dataset. Wang et al. [28] and Wei et al. [29] have analyzed the 12 motion features that can best distinguish the driving patterns, as shown in Table 4. The following three classical classifiers were evaluated with the 12 manually designed features.
3) K-Nearest Neighbor
K-Nearest Neighbor (KNN) is one of the popularly used classifiers. This analysis generates speed feature vectors with
The parameter k of KNN was a hyper parameter which was chosen to make the model perform best. The training and testing KNN classification results are illustrated in Fig. 14. The ROC curve of the KNN model on the testing set are given in Fig. 15, where we can find the classifier has a bad precision on classes 2 and 3.
4) Multilayer NN
Another classifier that we tested was a fully connected multilayer NN (MNN) with 3 layers. The weights of the hidden layer were obtained by training with back-propagation [29]. 12 standard features were extracted from the speed distribution with
5) Kernel PCA Based Multilayer NN
In [30], a preprocessing stage was constructed which computed the projection of the input pattern on the principal components as the extracted feature vectors. To compute the principal components, the mean of the input components was first computed and subtracted from the training vectors. A kernel function was then chosen to put the resulting vector into the high-dimension space. The covariance matrix of the high dimensional vectors was then computed and diagonalized by using singular value decomposition. The selected principal components represented by 5 dimensional feature vectors were used as the inputs of a multilayer classifier with 9 hidden neurons. The selected 5 principal components contained 99% information of the original data, and the number of the hidden neurons was chosen to enable the model to perform best. The accuracy on the test dataset was 91.93%, while on the training dataset the accuracy rate was 96.88%. The training and testing classification results based on kernel PCA (KPCA) based MNN (KPCAMNN) are illustrated in Fig. 18. The KPCAMNN achieves the overall better performance compared with the former two classifiers, which is shown in Fig. 19.
Compared with the traditional driving pattern recognition methods, the CNN + KPCA model has achieved the state-of-the-art correct rate, reaching 100% recognition accuracy on the training set and 97.40% on the testing set, whereas the best testing results from the methods based on feature extraction algorithms is 91.93%, which is a substantial leap. The model metrics including accuracy and AUC of the classifiers mentioned above are described in Table 5, from which we can conclude that the proposed framework outperforms the other methods on both accuracy and AUC.
F. Real Driving Pattern Recognition
The bus line 335 from Cicheng to Panhuo in Ningbo, China, with 41 bus stops, 11 traffic lights and 27.2Km journey crossing almost the main area of the city, was selected as the UDDS cycle to be recognized. The route map is shown in Fig. 20. The speed data were sampled from 6:00AM to 2:30PM in every 10s except the bus idle time, which is shown in Fig. 21. The classification results shown in Fig. 22 show that 95.81% samples were recognized as class 2 (i.e. UDDS cycle) with only a small part misclassified to be class 3 (WVUSUB) or class 1 (MBDC).
Conclusion
In this study, we have proposed a novel end-to-end approach to establish a strategy for real time driving pattern recognition applied on the speed data. Our results show that the proposed CNN + KPCA model effectively overcame the bottleneck of other traditional driving pattern classifiers based on manually extracted features. In addition, the proposed model have successfully avoid overfitting by adding the KPCA layer to the network architecture and the expanding dataset, which is potential when the number of the examples in the training set is not large enough. Our future goal is to apply this method to the energy management strategy of HEV to achieve efficient system control so as to save energy on vehicles.