Introduction
Nowadays, with the improvement of human living standards, more and more chronic diseases, including obesity, diabetes, hyperlipidemia, cardiovascular disease, caused by energy metabolism imbalance have become the focus of worldwide concern [1]. Active health management including scientific control of dietary energy intake and physical activity energy expenditure, provides an effective way for chronic diseases prevention and rehabilitation [2]. As the human body is a complex time-varying and nonlinear system, EE during physical activity can be affected by many factors, including activity intensity, individual physiological and psychological state, environment (e.g., temperature, humidity, barometric pressure, etc.), and anthropometric features (e.g., height, weight, age, etc.), which make real-time and accurate EE estimation a challenging study.
Although traditional clinical EE measuring methods, including direct calorimetry [3] and indirect calorimetry [4], have high accuracy, the large size, complex operation, and high cost make them unsuitable for EE measurement under free-living conditions. With the development of microelectronic technology, Micro Electro Mechanical Systems (MEMS) technology and computer technology, wearable devices with powerful sensing and computing functions have been widely used for health and activity monitoring.
However, the EE provided by most commercial wearable devices was computed from heart rate, step count, and anthropometric features. As the information contained in the discrete features is limited, the accuracy of the EE provided by commercial wearable devices is not accurate enough for some applications such as rehabilitation exercise after a heart attack [5] or heart surgery or professional sports training [6]. Besides, a systematic review [7] on the validity and reliability of commercial wearables in measuring energy expenditure published in 2020 concluded that the EE estimation function of the studied commercial wearable devices including Fitbit, Garmin, Polar, Apple Watch, Samsung, etc. were not reliable. Therefore, the goal of our research is to propose a new EE estimation method to improve the accuracy of EE estimation through designing new algorithms.
Numerous efforts have been done to improve the accuracy of EE estimation [8]–[24]. However, as most of the existing EE estimation methods were based on machine learning algorithms which need to manually design and select features, their EE estimation accuracy was still unsatisfactory. As the hand-crafted features used for machine learning algorithms are highly dependent on the professional knowledge of the researchers and cannot fully reflect the effective information contained in the raw signal, deep learning algorithms which can automatically extract deep features without any professional knowledge were then proposed for EE estimation [19], [20], [27].
Convolutional neural network (CNN) as one of the most widely used deep learning architecture has been proved to be an effective method to process time series signals in various applications including activity recognition [22], computer aided diagnosis [25], gait analysis [26], etc. The recent studies [19], [20] also proved the promising performance of CNN for EE estimation.
Motion signals collected by IMU sensors and HR calculated from ECG signals were the most used parameters for current EE estimation methods [13]. However, HR contains limited information compared with the raw ECG signal, which leads to a lower EE estimation accuracy of the current HR-based EE estimation methods. With the development of deep learning algorithms, comprehensive and deep-level features of ECG signals can be learned and extracted automatically.
Based on the research state and application requirement of EE estimation methods, we explored the feasibility of improving the EE estimation accuracy for application scenarios like clinical rehabilitation exercises and professional sports training monitoring by fusing multiple information. The main contributions of this study are summarized as follows:
Proposed a deep multi-branch CNN for automatic multi-scale feature extraction. The proposed feature extractor integrated with multiple CNN branches with different kernel sizes, by which multi-scale information was extracted from the input ECG and inertial signals.
Proposed a novel two-stage regression method to accurately predict EE. A soft label based ordinal regression method was first designed to realize a coarse-grained estimation of EE, then a linear regression method was implemented to further optimize the EE estimation output from the first stage.
To the best of our knowledge, this is the first work to make use of raw ECG signals instead of HR for high-accuracy EE estimation.
The experiments were performed to study the contribution of different input signals to the EE estimation model and verified that the raw ECG signal could contribute more to the performance improvement of the EE estimation in comparison with HR.
The rest of this paper is organized as follows: EE estimation related studies were first reviewed in Section II; The proposed DMTRN model was introduced in Section III; The designed performance evaluation experiments and corresponding test results were introduced and analyzed in Section IV; Performance discussion of the proposed model was given in Section V; Finally, Section VI concluded the whole paper.
Related Works
In recent years, with the increasing application requirements of accurate EE estimation, many works have been done to improve EE estimation performance.
Motion signals collected by IMU sensors were first used for EE estimation. The earliest attempting works were proposed by Montoye et al [8] and Chen et al [9], in their studies, acceleration signals of a single fixed sensor were used to estimate EE through a linear regression model. Considering the EE level varies with physical activities, multiple regression models were more suitable for EE estimation. Choi et al. [10] proposed a multiple linear regression method to estimate EE during walking and running respectively. Crouter et al. [11] proposed a two-regression model to improve the EE estimation accuracy by recognizing physical activities firstly.
With the further understanding of the factors affecting EE, physiological parameters including HR, heart rate variability (HRV) were fused with motion signals to estimate EE. Charlot et al. [12] improved the accuracy of EE estimation during running by using the anthropometric parameters, HR, and running speed as the model input. Brage et al. [13] used HR and acceleration signals to predict EE. Their findings suggested EE estimation performance using both acceleration and HR outperforms that using either of the parameters. To reduce the effect of the inter-individual physiological differences on EE estimation accuracy, Altini et al. [14] proposed an HR normalization method and used the normalized HR, activity intensity, anthropometric characteristics to estimate EE.
Moreover, study [15] further suggested that ECG can provide additional information for better prediction of EE. They not only calculated heart rate from ECG but also calculated various indicators of heart rate variability (HRV) as predictors. The results showed that adding the HRV to the input parameters can improve the EE estimation accuracy. Inspired by their results, we speculate that in addition to HR and HRV, there is still other more valuable information in raw ECG signals. As a result, the raw ECG signals were taken as the input of the proposed EE estimation model.
In terms of algorithms, more and more machine learning nonlinear models were explored for EE estimation recently with the development of artificial intelligence technology. Staudenmayer et al. [16] proposed two artificial neural networks (ANN) for physical activity recognition and EE estimation respectively. Catal et al. [17] combined the boosted decision tree regression (BDTR) algorithm and the median aggregation algorithm to improve the EE estimation accuracy. Cvetković et al. [18] proposed a real-time activity monitoring and EE estimation algorithm with a smartphone and a wristband using the random forest (RF) algorithm which took the variations of sensors’ location and orientation into considerations. However, as the manually designed and selected features contain very limited information, the machine learning model has low EE estimation accuracy.
Zhu et al. [19] were the first who proposed a deep learning method for EE estimation, raw acceleration signals were input to a CNN to estimate EE without any feature extraction and selection steps. Their experimental results showed that the CNN achieved a significant improvement in EE estimation performance compared to the activity-specific linear regression model and the ANN model. Also, the long short-term memory network [27] has also been applied in EE estimation. Nevertheless, the performances of these models still have room for improvement due to the simple network architecture and using simple HR as physiological state input.
According to the review of the previous related studies, it can be inferred that there may be more deep features in the raw ECG signals related to EE other than HR and HRV. Based on this hypothesis, we developed a deep learning architecture named DMTRN which used the raw ECG and 6-axis inertial signals for accurate EE estimation. Through ablation experiments, we verified the effectiveness of the raw ECG signal for EE estimation. Besides, the superior performance of the proposed DMTRN method was also verified through comparative studies with previous works.
Materials and Methods
A. Data Collection
A total of 33 healthy participants were recruited to participate in the experiments, the statistical anthropometric characteristics of all participants were summarized in Table I. During the experiment, the room temperature was maintained between 25 degrees Celsius and 26 degrees Celsius. The participants were asked to do a modified Bruce treadmill test [28] to collect their EE data at different activity intensity levels ranging from rest to the individual's maximum activity intensity. The experiment process was shown in Table II, it started with a 5-minute pre-rest, during which the participants were asked to stand still on the treadmill. Then follows the exercise stage, during this stage, the participants began to run at a speed of 3 km/h, and the speed increases to the next preset value every 5 minutes until reaches the maximum preset speed (11.6 km/h), and the participants would run at this maximum preset speed until they were physically exhausted. It was not necessary to reach the maximum speed during the exercise stage and the exercise can be terminated at any time when the HR of the participant reached the maximum HR or the participant signaled that he was exhausted. After the exercise stage, a 3-minute recovery stage and a 3-minute post-rest stage followed.
The scenario of the data collection experiment is shown in Fig. 1. The participants were asked to wear 12-lead ECG sensors (GE Medical System Information Technologies, INC, Cardiac Testing System) on their body and an IMU (Inertial Measurement Unit) sensor (Shimmer, Shimmer3 IMU unit) on their waist. An indirect calorimeter (MasterScreen CPX, Jaeger, Germany) with a mask worn on the participant's face was used to collect the reference EE. Considering the high quality of signal in lead v3 of 12 leads, we decided to use v3-lead ECG as the input ECG data. The sampling rate of the indirect calorimeter, IMU sensor, and ECG sensor were 0.2 Hz, 100 Hz, and 200 Hz respectively. Each participant participated in at least 1 session and at most 3 sessions (with the interval of 1 week) data acquisition experiments. Each session lasts about 30 minutes, and a total of 60 sessions were collected.
The study was approved by the Institutional Review Board of Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences. All participants signed the written informed consent before the experiments.
B. Preprocessing
Firstly, interpolation was used to deal with some missing data. Then some filters were used to eliminate noise from the collected data. For IMU data, a Butterworth low-pass filter with a 10 Hz cutoff frequency and a Wiener filter [28] with a window size of 1 second were used. For ECG data, a low-pass filter with a cutoff frequency of 50 Hz, and a nine-level wavelet decomposition were used.
Next, to reduce the effects of inertial sensor position changes on the EE estimation performance, the magnitude vectors of accelerometer and gyroscope signals were calculated using methods proposed by [29] and were used as the IMU data input combined with the 6-axis raw signals.
Previous studies [24] showed that the longer the sliding window used, the smaller the EE estimation error. Besides, according to the test results we found that the EE of the human body fluctuates little in one minute indicating and 1-minute window has been used in [21], [30], [31]. In order to balance the real-time and the accuracy of the model, our study adopted the 1-minute sliding window. Therefore, a 1-minute sliding window without overlap was applied on IMU data, ECG data, and reference EE data respectively for data segmentation. After the segmentation, IMU input vectors with a size of 6000×8, ECG input vectors with a size of 12000×1, and reference EE vectors with a size of 12×1 were obtained. The IMU input vectors and the ECG input vectors were directly fed into the IMU CNN branch and ECG CNN branch of the proposed model respectively, and the corresponding average values of the EE reference vectors were used for the final EE reference labels.
Moreover, five anthropometric features including sex, age, height, weight, and waistline, were also inputted into the proposed model after standardization and one-hot encoding.
C. Proposed Method
The architecture of the proposed network model DMTRN is shown in Fig. 2, and the pseudo-code describing the process of the algorithm is shown in Algorithm 1. In this section, the proposed EE estimation method was introduced in detail. Firstly, the overall structure of the feature extractor for automatic feature extraction was introduced. Then, how to embed the two-stage regression module into a deep regression model was explained.
Illustration of the network architecture. The network consists of a feature extractor and a two-stage regression module. The Convs components in the feature extractor module are branches with different kernel size. The two multi-branch CNNs of feature extractor module extract motion features and ECG features from IMU data and ECG data. The two-stage regression module generates the final EE based on the extracted features and the anthropometric features.
Algorithm 1: DMTRN for EE Estimation.
Input: training data(XECG, XIMU, XANT, yi) and testing data (X’ECG, X’IMU, X’ANT)
Output: the final predicted EE ŷ’ for testing data
# Training phase
Initialize hyperparameter K and λ
Initialize feature extractor's weights Wextract and two-stage regression's weights W1, b1, W2, b2
Initialize maximum iterated epochs N
for k = 1 → N do
Load XECG, XIMU, XANT, yi
Extracted features X based on Eq. (3) through the feature extractor
Calculate predicted EE ŷ and ŷ in two stages based on Eq. (4)
Update the weights of DMTRN with the total loss function of Eq. (7)
end for
# Testing phase
Load X’ECG, X’IMU, X’AN
Load the trained weights of DMTRN
1) Feature Extractor
Two multi-branch CNNs were designed for motion and ECG features extraction respectively. Each multi-branch convolutional neural network contains three branches, and each branch employs convolutional kernels of different sizes. Since convolutional kernels of different sizes can capture information of different time scales, the multi-scale context features can be extracted through our proposed multi-branch CNNs.
The architecture and parameters of the specific branch with kernel size k are shown in Table III. The branch for ECG feature extraction consists of 8 convolution layers and 5 pooling layers, while the branch for motion feature extraction consists of 10 convolution layers and 6 pooling layers. For motion features extraction, the kernel sizes of the three branches were 3, 5, and 7, respectively; the kernel sizes of the three branches for ECG feature extraction were set 5, 7, and 9 respectively. ReLU [32] was used as the activation function, and batch normalization [33] was used to alleviate the problem of internal covariate migration and speed up the training process after each convolution layer, dropout layer [34] was added to prevent overfitting. At each layer, multiple feature maps were generated according to the specified number of filters and subsequently were fed into the next layer, deep-level features were finally learned from the feature extractor by cascading the layers.
Furthermore, we combined deep-level features extracted by the multi-branch CNNs with the anthropometric features through a feed forward neural network (FNN) containing a hidden layer with 128 neural units, which improves the generalization ability of the model for estimating EE of different subjects.
2) Two-Stage Regression
The essence of ordinal regression [35] is to transform an ordinal regression task into a multi-class classification task through label discretization. With the increasing development and improvement of deep learning techniques, ordinal regression is attracting more and more attention and has been successfully applied in age estimation [36], depth estimation [37], head pose estimation [38], etc. combined with CNN.
For the first stage regression, ordinal regression was used to estimate a coarse-grained EE. The uniform discretization method was used to quantize a continuous EE value into a discrete value.
When a continuous EE interval \begin{equation*}
{r_i} = \left\lfloor {\frac{{(K - 1)({y_i} - a)}}{{b - a}}} \right\rfloor \tag{1}
\end{equation*}
The discrete interval value ri of reference EE was then encoded as a soft label vector yi
[39] with the dimension of 1×K. The j-th element in the vector is defined as
\begin{equation*}
{\mathbf{y}_{ij}} = \frac{{{e^{ - \phi ({r_i},{r_j})}}}}{{\sum\nolimits_{k = 1}^K {{e^{ - \phi ({r_i},{r_k})}}} }}\;{\rm{ }}\forall \;{r_j} \in [{r_1},{r_2},\ldots,{r_K}] \tag{2}
\end{equation*}
Let X denote the features from the feature extractor,
\begin{equation*}
\mathbf{X} = \Phi \left({{X_{ECG}},{X_{IMU}},{X_{ANT}},{\mathbf{W}_{extract}}} \right) \tag{3}
\end{equation*}
As can be seen from Fig. 2, the mapping from the features X to the final prediction of EE ŷ can be divided into two stages: the first stage predicted EE discrete distribution ŷ, and the second stage predicted EE continuous value ŷ. In detail, the whole process can be expressed as Eq. (4):
\begin{align*}
{\hat{\bf{y}}} &= g({\mathbf{W}_\mathbf{1}},\mathbf{X}) + {\mathbf{b}_\mathbf{1}}\\
\hat{y} &= {\mathbf{W}_\mathbf{2}}\sigma ({\hat{\bf{y}}}) + {b_2}\tag{4}
\end{align*}
Furthermore, two losses are defined for the proposed two-stage regression. The first loss is used to measure the discrepancy between the predicted EE distribution and the reference EE distribution, and finally controls the interval classification accuracy of EE. We adopt KL-Divergence (Eq. (5)) as the first loss function,
\begin{equation*}
{L_{ord}} = \frac{1}{N}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^K {{\mathbf{y}_{ij}}} } log\frac{{{\mathbf{y}_{ij}}}}{{{{{\hat{\bf{y}}}}_{ij}}}} \tag{5}
\end{equation*}
The second loss controls the prediction accuracy of the final EE and L1 loss (Eq. (6)) was adopted in this study.
\begin{equation*}
{L_{reg}} = \frac{1}{N}\sum\limits_{i = 1}^N {\left| {{y_i} - {{\hat{y}}_i}} \right|} \tag{6}
\end{equation*}
During the training phase, both two losses of the two regression stages were merged through Eq. (7) into a total loss to train the whole model,
\begin{equation*}
L = \lambda {L_{ord}} + {L_{reg}} \tag{7}
\end{equation*}
Experiments and Results
Extensive experiments were performed to verify and evaluate the proposed DMTRN for accurate EE estimation. Firstly, detailed ablation studies on the collected dataset were performed to verify the effectiveness of the two-stage regression module, the multi-branch module, and the extracted features. Then the EE estimation performance of our proposed model was compared with previous studies.
A. Implementation Details
A 10-fold cross validation on the collected dataset was performed to evaluate the performance of the proposed methods. The average performance of the 10 iterations was used as the final results. Root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were used to evaluate the performance of the model.
Data augmentation not only can effectively increase the number of samples and enhance the generalization ability of the model but also can add random noise to the datasets and improve the robustness of the model. Two data augmentation techniques were used in this study to improve the model performance:1) Multiply the amplitude of IMU data and ECG data with a random scalar drawn from a Gaussian distribution with mean 1 and standard deviation 0.1 to change the amplitude randomly [40]; 2) Swap the 3-axis of accelerometer data or gyroscope data with random permutations and rotate them by a random angle to simulate scenarios where inertial sensors were placed on different body locations [40].
We used the deep learning framework PyTorch [41] to build the proposed model. Adam optimizer [42] was used in the training process. The maximum number of epochs was 50, and the batch size was set to 64. The initial learning rate and momentum were set to 0.001 and 0.9 respectively.
B. Overall Performance
The overall performance of our proposed DMTRN was presented by the Correlation analysis plots and the Bland–Altman plots of the test results of the 10-fold cross validation in Fig. 3. In the Correlation plot, most of the points were lie closely to the red line, indicating a close correlation (R2 = 0.97) between the estimated EE and the reference EE. In the Bland–Altman plot, more than 95% of the points lie within the limit of agreement in EE evaluation, suggesting a high EE estimation accuracy of our proposed model.
Correlation (a) and Bland–Altman (b) plots comparing the estimated EE with reference EE.
C. Ablation Studies
1) Multi-Branch Module and Two-Stage Regression Module
In order to evaluate the effect of the proposed multi-branch module and two-stage regression module on the EE estimation performance respectively, we set up three models with different feature extraction and regression modules: (1) Single-branch: neither of the proposed multi-branch module nor the two-stage regression module was used in this model; (2) Single-branch + two-stage: only the proposed two-stage regression module was used in the model; (3) Multi-branch + two-stage: both the proposed multi-branch module and the two-stage regression module were used.
As the results in Fig. 4 shown, the model with the two-stage regression module yields a lower EE estimation RMSE than the model without the two-stage regression module, which proved the effectiveness of the proposed two-stage regression methods. Besides, the performance of the model has been further improved when substituted single-branch module with our proposed multi-branch module, which verified that the features extracted by the multi-branch CNNs have higher quality than those extracted by the single-branch CNNs.
Fig. 4 also illustrated the sensitivity of the proposed model to the EE discretization intervals K in the first regression stage. When K increases from 10 to 90, the EE estimation RMSE of our model ranges from 0.71 kcal/min to 0.76 kcal/min, indicating DMTRN's good robustness to a long range of discrete EE interval numbers. As too few discretizations intervals would cause large quantization error of the first-stage regression, while too large intervals would reduce the effects of the first-stage regression, one can also see that the RMSE increased when K was set smaller or larger than 50.
Further, we studied the effect of hyperparameter λ on the EE estimation performance. The RMSE, MAE, and MAPE of EE estimation were evaluated when λ were set to 0.1, 1, 5, 10. The test results listed in Table IV showed that the best performance was achieved when λ = 1. Since λ adjusted the contributions of the two regression tasks in the two stages, too small or too large λ could break the balance of their contributions.
In the following experiments, the discretization interval K was set to 50 and the hyperparameter λ was set to 1 if there was no special declaration.
2) Input Data
Many parameters including anthropometric data (ANT), inertial data (IMU), ECG data (ECG), heart rate (HR) were considered to be related to EE. In this section, we studied the effects of different input data on the proposed EE estimation model. The test results were listed in Table V. First, we can see from NO.1, 2, and 4 that using IMU or ECG alone as the model input leads to inferior EE estimation performance to the combination of ECG and IMU as the model input. Then, by further adding ANT to the model input we can see form NO.6 that the anthropometric features are useful for EE estimation performance improvement, although the improvement is not very significant.
To compare the contribution of ECG and HR to the EE estimation model, we deleted the ECG feature extractor module from our DMTRN and used the manually calculated HR to replace the extracted ECG features. The comparison results of NO.3 & 4 and NO. 5 & 6 proved the superiority of ECG to HR on EE estimation.
D. Comparison Studies
In order to verify the advantage of our proposed EE estimation method, we compared our method with other machine learning or deep learning algorithms including linear regression (LR) [15], [43], boosted decision tree regression (BDTR) [17], extreme gradient boosting (XGBoost) [21], random forest (RF) [18], convolutional neural network (CNN) [19] and densely connected convolutional network (DenseNet) [20] on our dataset. For machine learning algorithms, anthropometric features, motion features designed by [18], and HRV features designed by [15] were used to train the model with default parameters.
The test results were shown in Table VI, we can see from the results that RF had the best performance among all the compared algorithms. However, compared with the best algorithm, our proposed DMTRN model reduced the EE estimation error by 22.8% in terms of RMSE respectively.
Aimed at further evaluating the superiority of the proposed model, we also compared our method with other related studies. For easy comparison of various methods, various EE units and evaluation metrics were used. Comparison results of the input data, method, and EE estimation performance were listed in Table VII. Obviously, one great advantage of our model is that it is based on ECG and IMU, while other studies are usually based on HR and IMU. As the raw ECG signals contain more EE related information and the proposed DMTRN model can estimate EE more accurately. We can observe that the proposed DMTRN model achieves state-of-the-art performance in terms of RMSE, MAE, and MAPE compared with previous studies.
Discussion
A. Feature Visualization
For a better understanding of the properties of the proposed model, we visualized the learned features by mapping them to the raw ECG signals using the guided backpropagation approach [44]. As can be seen from Fig. 5, the contribution of every part of the ECG signals to the final estimation of EE is presented in different colors. The closer the color is to dark red, the greater the signal contributes to the EE estimation model, while the closer the color is to dark blue, the less.
First, it is obvious that the R wave of the ECG signal attracts most of the attention, and its contribution is greater than that of other parts. This can be explained by the fact that the proposed deep learning model learned and extracted the HR related features, which is closely related to the task of EE estimation. Besides the R wave, the T wave also has a part of the contribution to the model. This indicates that the model not only learned the HR related information but also learned morphological information near the T wave. Previous research [45] has found that the amplitude of T wave decreased significantly during exercise and increased significantly after exercise, which probably represented the anoxic and anaerobic myocardial metabolism. Therefore, the T wave plays an auxiliary role in the EE estimation, which also explains why the raw ECG signal is superior to HR in EE estimation.
B. The Tracking of Individual EE Changes
To evaluate the performance of the proposed model in tracking the EE changes, we provided our model estimated EE and the reference EE of one participant's test session in Fig. 6. Fig. 6 showed the EE estimation results of the first stage regression and the second stage regression respectively. By comparing the test results shown in Fig. 6, we can observe that the EE estimated value in the second stage was closer to the reference EE than that in the first stage. It is exactly the purpose of our designed two-stage regression module: in the first stage, a coarse-grained prediction of EE to determine the range is made; in the second stage, a further fine-grained prediction to determine the final value is generated.
EE tracking performance. Top: EE estimation results after the first stage regression. Bottom: EE estimation results after the second stage regression.
Moreover, the final EE estimation results shown in Fig. 6 after the two-stage regression module demonstrated that the proposed model can accurately track the large changes of an individual's EE, even in the case of poor ECG signal quality caused by motion artifacts at a high speed.
C. Complexity Analysis
As our DMTRN used a 1-D convolutional neural network for EE estimation, it is lightweight and can be implemented on mobile devices for real-time EE estimation. As shown in Table VIII, the parameter amount and model size was not large, which means low memory-consuming and the model can run on mobile systems. Apart from parameter amount and model size, the number of floating point operations (FLOPs) is also important. [46] showed that current mobile systems on market need about 154 ms to finish 569 MFlops when using MobileNet, as our model need 472.71 MFlops to finish EE estimation, it can be deduced that it would take less than 154 ms to finish EE estimation.
D. Limitations
Although the proposed approach demonstrated the feasibility of improving the accuracy of EE monitoring using inertial sensors and ECG signals, some uncertainties remain. First, our dataset was collected under a controlled laboratory environment. Data preprocessing procedures were implemented before the model's development to reduce the noise and remove signals with very poor quality. However, the signals collected in a real environment feature more noise, which may influence the stability of the proposed model. Second, the distribution of the participants was narrow and the number of participants was limited. Multiple factors such as diseases, age, and individual differences may affect the EE variations, resulting in uncertainty in the performance of the proposed model.
Conclusion
In this paper, we proposed a DMTRN model for accurate EE estimation using multiple sensor information. The multi-branch CNN module and two-stage regression module were developed to improve the EE estimation performance. The low memory-consuming and the short inference time showed the feasibility of the proposed model for real-time processing on mobile systems. The experiments show that DMTRN obtains the state-of-the-art performance, with the RMSE of 0.71 kcal/min, reduced by 22.8% compared with traditional RF model respectively. Besides, our study demonstrated that the raw ECG signals contained more other EE related information in addition to HR for the first time.
In future work, we will first transfer our model to wearable EE estimation scenario where ECG and IMU data were collected by wearable devices. and then we can improve the model's robustness and generality by enhancing the dataset through collecting data from subjects with a large range of age when doing different types of physical activities.