Introduction
In recent years, we have seen smartphones and their accessories move from an extremely niche market, to occupying a central role in the lives of a significant share of the global population. What used to be an obscure toy for a handful of tinkerers and executives is now our alarm clock, notebook, camera, dictionary, encyclopedia, fitness and wellness assistant, and window to the greater world. In this present work we utilize the extensive gamut of imaging technologies present in our smartphone camera, to measure and monitor bio-signals, towards better management of physical wellness, as well as towards taking precautionary and preventive action for alleviating medical issues.
An upcoming and fast growing field in smartphone based accessorization [1] is that of health and wellness. We now have thermometers [2], pulse monitors [3]–[6], pedometers [7], sleep trackers [8], calorie trackers [9], vein detectors [10], blood sugar monitors [11] and a plethora of other devices, either connected to or as part of smartphones. Some of these devices [1], [9], at least in part, use the sensors built inside the smartphones themselves to acquire and process the data thus needed.
Heart rate/pulse (HR) is often measured using contact based optical sensors that use PPG i.e. the variation of transmissivity and/or reflectivity of light through the finger tip as a function of arterial pulsation [12]–[14], followed by different signal post-processing approaches [12]–[16]. This approach works due to the differential absorption of certain frequency by hemoglobin in the blood, compared to the surrounding tissue such as flesh and bone. The wavelength under consideration varies from near-infrared (NIR) [12], to red [13] and even high intensity white flashlight [17] for fingertip based sensor systems. In other systems, sound reflectivity i.e. Doppler effect [18] is utilized to obtain similar parameters. Non-contact based optical sensors, have been used to measure HR from a video of a human face [14], by looking at the variation of average pixel value of the green channel in the subject’s forehead.
Several techniques exist [19]–[21] to enhance accuracy and reduce error rates for signals associated with HR both for the contact based approaches. For example, to reduce movement artefacts, one can look for aperiodic components at the lower end of the spectrum [20], or consider correlation between several signals across different channels [22]. In the case of the non-contact based approach, continuous facial detection (facial tracking) is used [23] to mitigate error introduced by natural movement. Other common error sources in the face based approach include the effects of ambient light, skin color and real-time constraints. Camera-based methods (particularly in the case of cameraphones) [5], [18], [23], [24], in addition to the above problems, have their own additional set of challenge such as spectral response range of the camera modules and ambient noise.
Contact based measurements of respiratory rate(RR) typically consists of electrophysiological measurements [25] analogous to electrocardiography (ECG) and/or pressure sensors [26], [27]. Non-contact measurement techniques usually utilize ultrasound [28] or microwave [29] readings. While there has been some work [12]–[14], [30], [31]–[33] in the optical and near optical frequency ranges, non-contact optical respiratory rate measurement still has potential for improvement.
In this manuscript we present a novel Hue (HSV colorspace) based observable for reflection based iPPG. By tracking time dependent changes of the average Hue, we can measure arterial pulsations from the forehead region. In what follows, in Section 2 we first discuss the Hue channel based iPPG in detail, and compare that with the traditional Green channel based iPPG approach. This is followed by Section 3 and 4 where we discuss the various experimental setup and validate the performance of Hue based iPPG with standard approaches to measure heart rate and respiratory rate, using videos of the user’s face procured using a commercial smartphone (LG G2, LG Electronics Inc., Korea). In Section 5 we summarize the new Hue based iPPG approach, and address possible applications and limitations.
Method
A. iPPG Obtained Using the Green Versus the Hue Channel
iPPG is based on the principle that arterial pulsation is the major differential component of blood flow. In a iPPG based approach, we measure arterial pulsation using a photodiode as a sensor and a LED as an illuminant with appropriate illumination frequency [15], both in the case of transmission or reflectance mode. In the case of using a camera as a sensor, a particular channel like the Green channel \begin{align} {\mathcal{ I}}_{G}^{iPPG}=&\sum _{t} \sum _{\lambda } \sum _{\overrightarrow {x},\overrightarrow {y}} P(\lambda,\overrightarrow {x},\overrightarrow {y},t) \notag \\&\qquad \qquad \qquad \times \,R(\lambda,\overrightarrow {x},\overrightarrow {y},t) {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)\qquad \end{align}
\begin{equation} P(\lambda,\overrightarrow {x},\overrightarrow {y},t) = \hat {P}(\lambda)I(\overrightarrow {x},\overrightarrow {y},t) \end{equation}
\begin{align} R(\lambda,\overrightarrow {x},t)=&v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \notag \\&\qquad \qquad \qquad +\, v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)\qquad \end{align}
\begin{align}&\hspace {-1pc}{\mathcal{ I}}_{G}^{iPPG} \notag \\=&\sum _{t} I(\overrightarrow {x},\overrightarrow {y},t) \sum _{\lambda } \sum _{\overrightarrow {x},\overrightarrow {y}} \hat {P}(\lambda) {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)\notag \\&\times \, [v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \!+\! v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)]\qquad \! \end{align}
Unlike the standard iPPG which measures average fluctuations of the the Green channel, in our proposed iPPG approach we measures average fluctuations of Hue values. To do this we first convert each RGB pixel in the image to the corresponding HSV pixel, and then for each frame compute the average Hue. The resulting iPPG signal \begin{align}&\hspace {-1.1pc}{\mathcal{ I}}_{0<H<0.1}^{iPPG}\notag \\=&\sum _{t} \sum _{\lambda } \sum _{\overrightarrow {x}} \hat {P}(\lambda) \check {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t) \notag \\&\times \, [v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \!+\!v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)]\qquad \! \end{align}
\begin{align} \,{\mathcal{ I}}_{0<H<0.1}^{ iPPG} \sim = \sum _{t} \sum _{0<H<0.1} \sum _{\overrightarrow {x},\overrightarrow {y}} \hat {P}(\lambda) [v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(t)]\notag \!\!\!\\ {}\end{align}
Schematic representation of extension co-efficient of hemoglobin
B. Getting Hue From RGB Pixels
As defined in the previous sub-section, we measure fluctuations in pixels that fall within a given Hue range. To do this we first transform each RGB pixel
Schematic representation of mapping between an RGB to a Hue color space. The top image is a toy model with 9 different colors with (
Figure showing the heart and respiratory rate obtained from a video of captured from a face captured using phone-flash. (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.). (2) Average Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3) Average post-processed Hue (from 0 – 0.1) as a function of time for the forehead region using HR and RR IIR bandpass filters
We have further demonstrated that under normal ambient light the Hue of the vast majority of forehead pixels is in the range of 0 to 0.1 which corresponds to the color of a human skin, as shown in Fig.2.Bottom Panel. Detailed HSV model of color based segmentation of human skin has been implemented by [36]–[39].
C. Facial and Forehead Detection
In this work we have used a Haar cascade based detection function to detect the face and eyes in each frame using OpenCV [40]. The function effectively returns a box representing the
Experimental Setup - Video Acquisition of the Face, and Post-Processing of the iPPG Signal Obtained Using the Hue Channel (Range 0–0.1)
Two videos of each subject’s face was acquired for 20 seconds using the rear-camera and standard video capturing application provided with a commercial smartphone (LG G2, LG Electronics Inc., Korea), one with and one without the flash. While the videos were shot, an external pulse oximeter was attached to the subjects’ fingers to measure the HR (Biosync B-50DL Finger Pulse Oximeter and Heart Rate Monitor, Contec Medical Systems Co. Ltd, China). The Biosync B-50DL Finger Pulse Oximeter has a measurement accuracy of ±2 beats per minute (BPM) [43]. In addition the subjects were asked to count their respiration rate (for the duration of a minute). In order to ensure that the RR were correct the subjects were asked to practise estimating their RR 5–10 times, and accuracy was corroborated by visual inspection of the subjects’ chest rising. This is consistent with the method recommended by John Hopkins University and John Hopkins Hospital [44]. The videos of the face was taken with minimal movement to simulate the standardized best case scenario. The distance between the subject’s face and the camera was typically ~0.5 meter (± 20%). and had little effect on the accuracy of the final result.
The 25 subjects (ages ranging between 20–30) were chosen to represent different skin types (Fitzpatrick scale 1–6) and gender. 5 subgroups were established based on skin types and gender - Group A is Caucasian male (Fitzpatrick scale 1–3), Group B is Caucasian female (Fitzpatrick scale 1–3), Group C is African male (Fitzpatrick scale 6), Group D is African female (Fitzpatrick scale 6) and Group E is Indian male (Fitzpatrick scale 4–5). The detailed results for some of these subjects are shown in Figs.4, 5 and 6. The authors have used human subjects to acquire readings and data in the form of videos of their face under various lighting conditions. The subjects were informed in detail about the nature and scope of the work, and provided their informed consent to be included in the investigation. The subject’s facial images have been anonymized for privacy reasons (singular exception has provided permission for image use, and was de-anonymized for representative purposes). The subject data was acquired and handled based on the general principles of the Declaration of Helsinki 2013 [45], specifically in regard to informed consent, scientific requirements and research protocols, privacy and confidentiality.
Figure showing the heart and respiratory rate obtained from a video of a human face captured using phone-flash(Fitzgerald scale 1–2). (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.) (2) Average preprocessed Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3–6) Average post-processed values as a function of time for the forehead region, using HR and RR IIR bandpass filters and it’s corresponding frequency spectrum (3) Hue, (4) Hue from 0 – 0.1, (5) Green channel from RGB, and (6) Saturation using Hue from 0 – 0.1.
Figure showing the heart and respiratory rate obtained from a video of a human face captured using phone-flash(Fitzgerald scale 5–6). (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.) (2) Average preprocessed Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3–6) Average post-processed values as a function of time for the forehead region, using HR and RR IIR bandpass filters and it’s corresponding frequency spectrum (3) Hue, (4) Hue from 0 – 0.1, (5) Green channel from RGB, and (6) Saturation using Hue from 0 – 0.1.
Figure showing the heart and respiratory rate obtained from a video of a human face captured using phone-flash(Fitzgerald scale 5–6). (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.) (2) Average preprocessed Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3–6) Average post-processed values as a function of time for the forehead region, using HR and RR IIR bandpass filters and it’s corresponding frequency spectrum (3) Hue, (4) Hue from 0 – 0.1, (5) Green channel from RGB, and (6) Saturation using Hue from 0 – 0.1.
Once the 50 videos were obtained they were post processed using a MATLAB R2016a script, and the resulting plots are shown in Figs.4, 5 and 6. In order to measure the HR and the RR from a iPPG signal, from the video of a subject’s face, the MATLAB script first capture a 20 seconds video at 30fps. These videos were then post processed at 10fps, by using one in every three consecutive frames and rejecting the rest. We have further checked that the down-sampling does not affect the final results by processing each of these videos at 15 and 30fps and computing HR and RR values. The video length needs to be a minimum of 20 seconds long to gather statistically significant data since RR can be as low as 6 per minute, or 0.1 Hz. To ensure that at least 2 complete breaths are acquired within the sample, sampling period needs to be ≥ 20 seconds.
The MATLAB script then detect faces and eyes using the approach mentioned in section Sec.II-C. For each processed frame, the script then computes the average Hue for the forehead region for the 200 frames, which then gives us the raw iPPG signal
In addition the MATLAB script computes HR and RR, using iPPG obtained from the average value of the pixels in the forehead region, using Hue (without any range specifications), Green channel (Similar to the approach of Poh et al. [23]), and Saturation (HSV colorspace), for pixels with Hue within a range of 0–0.1.
Results and Discussion
A. Qualitative Comparison of HR and RR Obtained From the iPPG Signal Using the Hue Channel, With Other Algorithmic Approaches
To compare the accuracy of the different iPPG approaches, we have simultaneously measured HR and RR of 25 subjects using standard instruments like pulse oximeters (HR) and self-reporting (RR). These results are further tabulated in details in Table.2 and Table.3 (Flash on), and Table.4 and Table.5 (Flash Off).
The face video based approaches in literature show that observables designed to measure HR and RR using the Green channel outperform Red or Blue channel [23]. Table.2 and Table.3, show that the Hue channel and particularly the Hue channel within a range of 0–0.1 show excellent correspondence with the experimentally measured data as compared to the other observables including the Green channel. Also, the average Saturation as a function of time (for pixels with Hue within a range of 0–0.1), shows the least correlation with the experimental data.
B. Quantitative Comparison of Accuracy of Measurement of HR and RR Obtained From the iPPG Signal Using the Hue Channel Versus With Green Channel
We can further use inferential statistics such as linear fitting, to plot different sets of computed HR/RR with their corresponding measured values. The closer the slope of the fitted line (
Scatter plots comparing accuracy of HR and RR obtained from a iPPG using a face video, with standard approaches for measuring HR (Panel A and C) and RR (Panel B and D). Panel A shows results for HR computed using two iPPGs obtained from a single video with flash on (a) Hue (0–0.1) (Red Full) and (b) Green channel (Green Dashed), compared with HR measured using pulse oximetry. Panel B shows results for RR computed using two iPPG obtained from a single video with flash on (a) Hue (0–0.1) (Blue Full) and (b) Green channel (Green Dashed), compared with RR measured using self reporting. Panel C shows results for HR computed using two iPPG obtained from two separate videos (a) Hue (0–0.1) with flash on (Red Full) and (b) Hue (0–0.1) with flash off (Grey Dashed), compared with HR measured using pulse oximetry. Panel D shows results for RR computed using (a) Hue (0–0.1) with flash on (Blue Full) and (b) Hue (0–0.1) with flash off (Grey Dashed), compared with RR measured using self reporting. In each set of data their are 5 subgroups based on skin types, Caucasian male (Triangle Up Fill), Caucasian female (Triangle Up), African male (Triangle Down Fill), Africa female (Triangle Down) and Indian Male (Empty Box).
This could be further illustrated using the Pearson Correlation test, where
Once we have established, that the face videos illuminated with flash works better, we further analyse those results using the Bland-Altman plots as show in Fig.8. The corresponding mean of difference and standard deviation of difference (drawn as lines in Fig.8) are tabulated in Table.1. In the case of HR the standard deviation of the difference, using Hue (0–0.1) is 4.16 (as shown in Fig.8.A.), and using Green channel is 0.28 (as shown in Fig.8.B.). In the case of RR standard deviation of the difference using Hue (0–0.1) is 5.64 (as shown in Fig.8.A.), and using Green channel is 0.28 (as shown in Fig.8.B.). This further illustrates that the Hue (0–0.1) approach works better than Green for both HR and RR.
Bland-Altman plots comparing accuracy of HR and RR obtained from a iPPG using a face video, with standard approaches for measuring HR (Panel A and B) and RR (Panel C and D). Panel A shows results for HR computed using Hue (0–0.1), compared with HR measured using pulse oximetry. Panel B shows results for HR computed using Green channel, compared with HR measured using pulse oximetry. Panel C shows results for RR computed using Hue (0–0.1), compared with RR measured using self reporting. Panel D shows results for RR computed using Green Channel, compared with RR measured using self reporting. In each set of data their are 5 subgroups based on skin types, Caucasian male (Triangle Up Fill), Caucasian female (Triangle Up), African male (Triangle Down Fill), African female (Triangle Down) and Indian Male (Empty Box).
The efficacy of the Hue (0–0.1) approach over the Green channel, is further illustrated using the paired student’s t-test (as tabulated in Table.1). Where the in the case of HR the p-value, using Hue (0–0.1) is 0.8887, and using Green is 0.9068. Similarly in the case of RR the p-value, using Hue (0–0.1) is 0.2885, and using Green is 0.5608.
The standard for HR monitors as set by Advancement of Medical Instrumentation EC-13 states that, the accuracy requirements are root mean square error (RMSE) ≤ 5 BPM or ≤ 10%, whichever is greater. The RMSE values for HR using Hue (0–0.1) is 0.8887 BPM, and using Green is 0.9068 BPM. The RMSE values for RR using Hue (0–0.1) is 3.8884 BPM, and using Green is 5.6885 BPM. This clearly illustrates that the Hue (0–0.1) approach gives better results than traditional Green channel.
Conclusion
In this study, we have introduced a novel noninvasive approach to measure pulse and respiratory rate from a short video of the subject’s face. Unlike traditional iPPG approaches that measures the fluctuation of a particular RGB color space, we have measured the fluctuation in the Hue channel in the HSV color space. Since this observable primarily depends on the AC component of the pulsatile blood, this observable is a more accurate and robust approach to measure vital signs using a video. In this study, we have further shown that (1) HR and RR derived from iPPG obtained using the Hue channel (range 0–0.1) gives the most co-related results with standard instruments. (2) The HR and RR derived from iPPG in obtained from videos shot with an additional flash based illumination, is qualitatively better than those obtained without the flash light. This is further demonstrated since the Pearson’s r and RMSE values obtained using Hue (0–0.1) at rest in our current work is 0.9201 and 4.1617, compared to 0.89 and 6 obtained using green channel (before post-processing) as reported by Poh et al. [23].
We have summarized our approach in the form of an flowchart as shown in Fig.9.
Schematic representation of the process to compute iPPG using average Hue from 0 – 0.1.
However our proposed algorithms will not work in a number of real world scenarios. For example, if the forehead is partially / fully covered with hair (hairstyles such as devilock, bob cut, bettie page and beehive) or a head-gear (hat, cap, turban), or in the presence of scar tissue on forehead, and instances in which the facial detection algorithm does not detect a face due to non-traditional facial features such as presence of a heavy beard. Further studies are required to understand effects of external lighting, skin color and movement on the accuracy of the final results. In addition more accurate facial mapping technology to find the forehead region can be implemented to improve the accuracy of the face based pulse and respiratory rate detection method.
These current findings could be easily translated to a smartphone camera application to measure HR using a camera flash as an illumination source, more accurately than the current market alternatives. Smartphone applications (or APIs) coupled with such technology, will have further applications as a Software As A Medical Device (SAAMD) in the video based telemedicine market allowing an average user to monitor their HR and RR without buying additional equipment. The telemedicine market includes tele-hospital care (where the consultant doctor can dial in for monitoring patients), and tele-home care (where remote healthcare connection (initiated by the patient) with a network of clinicians is usually available 24/7 for non emergency care). The clinical relevance of telemedicine been accelerated by the advent of tele-home care platforms such as Babylon Health, MDLive, Doctor On Demand, Teladoc, and LiveHealth Online.
In addition HR measured in clinical settings using electrocardiogram (ECG), requires patients to wear chest straps with adhesive gel patches that can be both uncomfortable and abrasive for the user. HR monitored using pulse oximetry at the finger-tip or the earlobe can also be inconvenient for long-term wear. A video based software solution will be critical towards avoiding such inconveniences. This is of particular interest to neo-natal and elderly care, where contact based approaches can cause additional irritation to the subjects’ fragile skin.
In summary, we hope this will lead to development of easy to access smartphone camera based technology, for continuous monitoring of vital signs both for fitness applications as well as predicting the overall health of the user.
ACKNOWLEDGMENT
The authors report salaries, personal fees and non-financial support from their employer Think Biosolution Limited, which is active in the field of camera-based diagnostics and fitness tracking, outside the submitted work.