Journals & Magazines >IEEE Journal of Translational... >Volume: 6

Algorithms for Monitoring Heart Rate and Respiratory Rate From the Video of a User’s Face

This paper presents the development and validation of a noninvasive point-of-care method for monitoring heart rate and respiratory rate from video of a user's face.

Abstract:

Smartphone cameras can measure heart rate (HR) by detecting pulsatile photoplethysmographic (iPPG) signals from post-processing the video of a subject's face. The iPPG si...Show More

Metadata

Abstract:

Smartphone cameras can measure heart rate (HR) by detecting pulsatile photoplethysmographic (iPPG) signals from post-processing the video of a subject's face. The iPPG signal is often derived from variations in the intensity of the green channel as shown by Poh et. al. and Verkruysse et. al.. In this pilot study, we have introduced a novel iPPG method where by measuring variations in color of reflected light, i.e., Hue, and can therefore measure both HR and respiratory rate (RR) from the video of a subject's face. This paper was performed on 25 healthy individuals (Ages 20-30, 15 males and 10 females, and skin color was Fitzpatrick scale 1-6). For each subject we took two 20 second video of the subject's face with minimal movement, one with flash ON and one with flash OFF. While recording the videos we simultaneously measuring HR using a Biosync B-50DL Finger Heart Rate Monitor, and RR using self-reporting. This paper shows that our proposed approach of measuring iPPG using Hue (range 0-0.1) gives more accurate readings than the Green channel. HR/Hue (range 0-0.1) (r = 0.9201,p-value = 4.1617, and RMSE = 0.8887) is more accurate compared with HR/Green (r = 0.4916,p-value = 11.60172, and RMSE = 0.9068). RR/Hue (range 0-0.1) (r = 0.6575, p-value = 0.2885, and RMSE = 3.8884) is more accurate compared with RR/Green (r = 0.3352, p-value = 0.5608, and RMSE = 5.6885). We hope that this hardware agnostic approach for detection of vital signals will have a huge potential impact in telemedicine, and can be used to tackle challenges, such as continuous non-contact monitoring of neo-natal and elderly patients. An implementation of the algorithm can be found at https://pulser.thinkbiosolution.com

This paper presents the development and validation of a noninvasive point-of-care method for monitoring heart rate and respiratory rate from video of a user's face.

Published in: IEEE Journal of Translational Engineering in Health and Medicine ( Volume: 6)

Article Sequence Number: 2700111

Date of Publication: 12 April 2018

Electronic ISSN: 2168-2372

PubMed ID: 29805920

DOI: 10.1109/JTEHM.2018.2818687

Contents

SECTION I.

Introduction

In recent years, we have seen smartphones and their accessories move from an extremely niche market, to occupying a central role in the lives of a significant share of the global population. What used to be an obscure toy for a handful of tinkerers and executives is now our alarm clock, notebook, camera, dictionary, encyclopedia, fitness and wellness assistant, and window to the greater world. In this present work we utilize the extensive gamut of imaging technologies present in our smartphone camera, to measure and monitor bio-signals, towards better management of physical wellness, as well as towards taking precautionary and preventive action for alleviating medical issues.

An upcoming and fast growing field in smartphone based accessorization [1] is that of health and wellness. We now have thermometers [2], pulse monitors [3]–[6], pedometers [7], sleep trackers [8], calorie trackers [9], vein detectors [10], blood sugar monitors [11] and a plethora of other devices, either connected to or as part of smartphones. Some of these devices [1], [9], at least in part, use the sensors built inside the smartphones themselves to acquire and process the data thus needed.

Heart rate/pulse (HR) is often measured using contact based optical sensors that use PPG i.e. the variation of transmissivity and/or reflectivity of light through the finger tip as a function of arterial pulsation [12]–[14], followed by different signal post-processing approaches [12]–[16]. This approach works due to the differential absorption of certain frequency by hemoglobin in the blood, compared to the surrounding tissue such as flesh and bone. The wavelength under consideration varies from near-infrared (NIR) [12], to red [13] and even high intensity white flashlight [17] for fingertip based sensor systems. In other systems, sound reflectivity i.e. Doppler effect [18] is utilized to obtain similar parameters. Non-contact based optical sensors, have been used to measure HR from a video of a human face [14], by looking at the variation of average pixel value of the green channel in the subject’s forehead.

Several techniques exist [19]–[21] to enhance accuracy and reduce error rates for signals associated with HR both for the contact based approaches. For example, to reduce movement artefacts, one can look for aperiodic components at the lower end of the spectrum [20], or consider correlation between several signals across different channels [22]. In the case of the non-contact based approach, continuous facial detection (facial tracking) is used [23] to mitigate error introduced by natural movement. Other common error sources in the face based approach include the effects of ambient light, skin color and real-time constraints. Camera-based methods (particularly in the case of cameraphones) [5], [18], [23], [24], in addition to the above problems, have their own additional set of challenge such as spectral response range of the camera modules and ambient noise.

Contact based measurements of respiratory rate(RR) typically consists of electrophysiological measurements [25] analogous to electrocardiography (ECG) and/or pressure sensors [26], [27]. Non-contact measurement techniques usually utilize ultrasound [28] or microwave [29] readings. While there has been some work [12]–[14], [30], [31]–[33] in the optical and near optical frequency ranges, non-contact optical respiratory rate measurement still has potential for improvement.

In this manuscript we present a novel Hue (HSV colorspace) based observable for reflection based iPPG. By tracking time dependent changes of the average Hue, we can measure arterial pulsations from the forehead region. In what follows, in Section 2 we first discuss the Hue channel based iPPG in detail, and compare that with the traditional Green channel based iPPG approach. This is followed by Section 3 and 4 where we discuss the various experimental setup and validate the performance of Hue based iPPG with standard approaches to measure heart rate and respiratory rate, using videos of the user’s face procured using a commercial smartphone (LG G2, LG Electronics Inc., Korea). In Section 5 we summarize the new Hue based iPPG approach, and address possible applications and limitations.

SECTION II.

Method

A. iPPG Obtained Using the Green Versus the Hue Channel

iPPG is based on the principle that arterial pulsation is the major differential component of blood flow. In a iPPG based approach, we measure arterial pulsation using a photodiode as a sensor and a LED as an illuminant with appropriate illumination frequency [15], both in the case of transmission or reflectance mode. In the case of using a camera as a sensor, a particular channel like the Green channel ${\mathcal{ I}}_{G}$ , at which oxygenated hemoglobin absorbs light differentially compared to the surrounding tissue is used. The optical sensor when taking a video of the face measures the the signal ${\mathcal{ I}}_{G}^{iPPG}$ from the forehead (which is average fluctuation of the green channel of the video obtained using the smartphone camera in our case). ${\mathcal{ I}}_{G}^{iPPG}$ is obtained over frames 0 to t, where each frame has $\overrightarrow {x} \times \overrightarrow {y}$ pixels (Similar to “Raw Signal” in Fig.3),

$\begin{align} {\mathcal{ I}}_{G}^{iPPG}=&\sum _{t} \sum _{\lambda } \sum _{\overrightarrow {x},\overrightarrow {y}} P(\lambda,\overrightarrow {x},\overrightarrow {y},t) \notag \\&\qquad \qquad \qquad \times \,R(\lambda,\overrightarrow {x},\overrightarrow {y},t) {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)\qquad \end{align}$ View Source

and

$P(\lambda)$

is the power of a given light source at the given wavelength

$\lambda$

$R(\lambda)$

is the reflectance of the surface at a wavelength

$\lambda$

, and

${h}(\lambda)$

are the CIE (International Commission on Illumination) color-matching functions accounting for the response of the optical sensor (eyes, camera, etc.) [34]. Since the incident light can be assumed to be time-invariant,

$P(\lambda,\overrightarrow {x},\overrightarrow {y},t)$

can be further decoupled as the intensity of the incident light

$\hat {P}(\lambda)$

and the frequency distribution of the light normalized with respect to the total energy

$I(\overrightarrow {x},\overrightarrow {y},t)$

$\begin{equation} P(\lambda,\overrightarrow {x},\overrightarrow {y},t) = \hat {P}(\lambda)I(\overrightarrow {x},\overrightarrow {y},t) \end{equation}$

View Source

Also, the change in the reflectance of the pulsating tissue

$R(\lambda,\overrightarrow {x},\overrightarrow {y},t)$

, can be further modeled as a sum of a static non-pulsatile (DC) component and a pulsatile (AC) component. Both of these are dependent on the volume

$v(\overrightarrow {x},\overrightarrow {y},t)$

and reflectivity

$b_{DC}(\lambda,t)$

of the individual components.

$\begin{align} R(\lambda,\overrightarrow {x},t)=&v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \notag \\&\qquad \qquad \qquad +\, v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)\qquad \end{align}$

View Source

As a result the observable for iPPG can be written as a function of pulsatile (AC) and non-pulsatile (DC) part,

$\begin{align}&\hspace {-1pc}{\mathcal{ I}}_{G}^{iPPG} \notag \\=&\sum _{t} I(\overrightarrow {x},\overrightarrow {y},t) \sum _{\lambda } \sum _{\overrightarrow {x},\overrightarrow {y}} \hat {P}(\lambda) {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)\notag \\&\times \, [v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \!+\! v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)]\qquad \! \end{align}$

View Source

The time dependent variance of pulsatile (AC) component is strongly correlated to the ECG signal corresponding to HR and the RR. In literature where an RGB color space was directly used to measure the pulsatile (AC) component we find the best results for the green instead of the red channel [30], as an artefact of the parameterization of the RGB color-space.

Unlike the standard iPPG which measures average fluctuations of the the Green channel, in our proposed iPPG approach we measures average fluctuations of Hue values. To do this we first convert each RGB pixel in the image to the corresponding HSV pixel, and then for each frame compute the average Hue. The resulting iPPG signal ${\mathcal{ I}}_{0<H<0.1}^{iPPG}$ (“Raw Signal” in Fig.3) can hence be written as,

$\begin{align}&\hspace {-1.1pc}{\mathcal{ I}}_{0<H<0.1}^{iPPG}\notag \\=&\sum _{t} \sum _{\lambda } \sum _{\overrightarrow {x}} \hat {P}(\lambda) \check {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t) \notag \\&\times \, [v_{DC}(\overrightarrow {x},\overrightarrow {y},t) b_{DC}(\lambda,t) \!+\!v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(\lambda,t)]\qquad \! \end{align}$ View Source

In addition by choosing the Hue range from 0 to 0.1, we can measure fluctuations corresponding to the skin color and avoid external noise in the measurements. The choice of Hue range corresponds to a choice of a particular

$\lambda$

range (

$700nm \geq \lambda \geq 600$

), which can then be modelled as

$\check {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)$

(shown as a square well in Fig.1). Since this primarily depends on the AC component, we can further approximate it as,

$\begin{align} \,{\mathcal{ I}}_{0<H<0.1}^{ iPPG} \sim = \sum _{t} \sum _{0<H<0.1} \sum _{\overrightarrow {x},\overrightarrow {y}} \hat {P}(\lambda) [v_{AC}(\overrightarrow {x},\overrightarrow {y},t) b_{AC}(t)]\notag \!\!\!\\ {}\end{align}$

View Source

$FIGURE 1. - Schematic representation of extension co-efficient of hemoglobin $Hb$ (red) and oxygenated hemoglobin $HbO_{2}$ (blue) as a function of absorption wavelength. Overlaid are the ${h}(\lambda,\vec {x},\vec {y},t)$ (green) i.e. CIE color matching function for green channel, and $\check {h}(\lambda,\vec {x},\vec {y},t)$ (black).$

FIGURE 1.

Schematic representation of extension co-efficient of hemoglobin $Hb$ (red) and oxygenated hemoglobin $HbO_{2}$ (blue) as a function of absorption wavelength. Overlaid are the ${h}(\lambda,\vec {x},\vec {y},t)$ (green) i.e. CIE color matching function for green channel, and $\check {h}(\lambda,\vec {x},\vec {y},t)$ (black).

Show All

B. Getting Hue From RGB Pixels

As defined in the previous sub-section, we measure fluctuations in pixels that fall within a given Hue range. To do this we first transform each RGB pixel ${Pixel}_{R,G,B}$ to a HSV pixel ${Pixel}_{H,S,V}$ (which is equivalent to ${h}(\lambda,\overrightarrow {x},\overrightarrow {y},t) \rightarrow \check {h}(\lambda,\overrightarrow {x},\overrightarrow {y},t)$ ) using [35]. Each Hue value corresponds to a different color, for example 0 is red, green is 0.33, and blue is 0.66 (As shown in Fig.2.Top Panel.) Hence by choosing a range in the Hue values, one can effectively choose corresponding absorption frequency.

$FIGURE 2. - Schematic representation of mapping between an RGB to a Hue color space. The top image is a toy model with 9 different colors with ( ${Pixel}_{Hue} = 0$ , 0.11, $0.22, \ldots $ ) and the corresponding points ${Pixel}_{R,G,B}$ (red dots) in the xyY plot. The same transformation when applied to the pixels in the forehead region, shows that they have a Hue in the range of 0 and 0.1.$

FIGURE 2.

Schematic representation of mapping between an RGB to a Hue color space. The top image is a toy model with 9 different colors with ( ${Pixel}_{Hue} = 0$ , 0.11, $0.22, \ldots$ ) and the corresponding points ${Pixel}_{R,G,B}$ (red dots) in the xyY plot. The same transformation when applied to the pixels in the forehead region, shows that they have a Hue in the range of 0 and 0.1.

Show All

$FIGURE 3. - Figure showing the heart and respiratory rate obtained from a video of captured from a face captured using phone-flash. (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.). (2) Average Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3) Average post-processed Hue (from 0 – 0.1) as a function of time for the forehead region using HR and RR IIR bandpass filters $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{HR}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{RR}$ , and it’s corresponding frequency spectrum $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(HR)}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(RR)}$ .$

FIGURE 3.

Figure showing the heart and respiratory rate obtained from a video of captured from a face captured using phone-flash. (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.). (2) Average Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3) Average post-processed Hue (from 0 – 0.1) as a function of time for the forehead region using HR and RR IIR bandpass filters $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{HR}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{RR}$ , and it’s corresponding frequency spectrum $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(HR)}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(RR)}$ .

Show All

We have further demonstrated that under normal ambient light the Hue of the vast majority of forehead pixels is in the range of 0 to 0.1 which corresponds to the color of a human skin, as shown in Fig.2.Bottom Panel. Detailed HSV model of color based segmentation of human skin has been implemented by [36]–[39].

C. Facial and Forehead Detection

In this work we have used a Haar cascade based detection function to detect the face and eyes in each frame using OpenCV [40]. The function effectively returns a box representing the $face = [x_{face},y_{face}, height_{face},width_{face}]$ and $eye = [x_{eye},y_{eye},height_{eye},width_{eye}]$ for each frame. Where $x_{face}$ and $x_{eye}$ are the location of the x-pixels (column) on the top-left corner of the box, and $y_{face}$ and $y_{eye}$ are the location of the y-pixels (row) on the top-left corner of the box. $height_{face},height_{eye}$ are the height of the boxes (i.e. lengths of the column), and $width_{face},width_{eye}$ are the width of the boxes (i.e. lengths of the rows). In MATLAB the top left corner of a frame $t$ is (x,y) = (0,0), and bottom right corner is (x,y) = ( $\overrightarrow {x},\overrightarrow {y}$ ). Object detection such as faces and eyes using Haar like feature-based cascade is a machine learning approach, where a cascade function is trained from a lot of positive and negative images, which is subsequently used to detect objects in other images [41], [42]. Using these face and eye boxes, we then compute the forehead parameters from each frame using, $forehead = [x_{eye}+(width_{eye}*0.25), y_{face}, width_{eye}*0.5, (y_{eye}-y_{face})*0.6]$ (See Fig.3.). The parameters for computing forehead are optimized using multiple videos, and agrees with available literature such as Poh et al. [23] who chose the center 60% of the bounding box width and the full height.

SECTION III.

Experimental Setup - Video Acquisition of the Face, and Post-Processing of the iPPG Signal Obtained Using the Hue Channel (Range 0–0.1)

Two videos of each subject’s face was acquired for 20 seconds using the rear-camera and standard video capturing application provided with a commercial smartphone (LG G2, LG Electronics Inc., Korea), one with and one without the flash. While the videos were shot, an external pulse oximeter was attached to the subjects’ fingers to measure the HR (Biosync B-50DL Finger Pulse Oximeter and Heart Rate Monitor, Contec Medical Systems Co. Ltd, China). The Biosync B-50DL Finger Pulse Oximeter has a measurement accuracy of ±2 beats per minute (BPM) [43]. In addition the subjects were asked to count their respiration rate (for the duration of a minute). In order to ensure that the RR were correct the subjects were asked to practise estimating their RR 5–10 times, and accuracy was corroborated by visual inspection of the subjects’ chest rising. This is consistent with the method recommended by John Hopkins University and John Hopkins Hospital [44]. The videos of the face was taken with minimal movement to simulate the standardized best case scenario. The distance between the subject’s face and the camera was typically ~0.5 meter (± 20%). and had little effect on the accuracy of the final result.

The 25 subjects (ages ranging between 20–30) were chosen to represent different skin types (Fitzpatrick scale 1–6) and gender. 5 subgroups were established based on skin types and gender - Group A is Caucasian male (Fitzpatrick scale 1–3), Group B is Caucasian female (Fitzpatrick scale 1–3), Group C is African male (Fitzpatrick scale 6), Group D is African female (Fitzpatrick scale 6) and Group E is Indian male (Fitzpatrick scale 4–5). The detailed results for some of these subjects are shown in Figs.4, 5 and 6. The authors have used human subjects to acquire readings and data in the form of videos of their face under various lighting conditions. The subjects were informed in detail about the nature and scope of the work, and provided their informed consent to be included in the investigation. The subject’s facial images have been anonymized for privacy reasons (singular exception has provided permission for image use, and was de-anonymized for representative purposes). The subject data was acquired and handled based on the general principles of the Declaration of Helsinki 2013 [45], specifically in regard to informed consent, scientific requirements and research protocols, privacy and confidentiality.

FIGURE 4.

Figure showing the heart and respiratory rate obtained from a video of a human face captured using phone-flash(Fitzgerald scale 1–2). (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.) (2) Average preprocessed Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3–6) Average post-processed values as a function of time for the forehead region, using HR and RR IIR bandpass filters and it’s corresponding frequency spectrum (3) Hue, (4) Hue from 0 – 0.1, (5) Green channel from RGB, and (6) Saturation using Hue from 0 – 0.1.

Show All

FIGURE 5.

Figure showing the heart and respiratory rate obtained from a video of a human face captured using phone-flash(Fitzgerald scale 5–6). (1) Image of a face corresponding to the first frame, superimposed with detected face (red box), detected eyes (blue box) and the detected forehead (green box.) (2) Average preprocessed Hue as a function of time for the forehead region and it’s corresponding frequency spectrum. (3–6) Average post-processed values as a function of time for the forehead region, using HR and RR IIR bandpass filters and it’s corresponding frequency spectrum (3) Hue, (4) Hue from 0 – 0.1, (5) Green channel from RGB, and (6) Saturation using Hue from 0 – 0.1.

Show All

FIGURE 6.

Show All

Once the 50 videos were obtained they were post processed using a MATLAB R2016a script, and the resulting plots are shown in Figs.4, 5 and 6. In order to measure the HR and the RR from a iPPG signal, from the video of a subject’s face, the MATLAB script first capture a 20 seconds video at 30fps. These videos were then post processed at 10fps, by using one in every three consecutive frames and rejecting the rest. We have further checked that the down-sampling does not affect the final results by processing each of these videos at 15 and 30fps and computing HR and RR values. The video length needs to be a minimum of 20 seconds long to gather statistically significant data since RR can be as low as 6 per minute, or 0.1 Hz. To ensure that at least 2 complete breaths are acquired within the sample, sampling period needs to be ≥ 20 seconds.

The MATLAB script then detect faces and eyes using the approach mentioned in section Sec.II-C. For each processed frame, the script then computes the average Hue for the forehead region for the 200 frames, which then gives us the raw iPPG signal ${\mathcal{ I}}_{0<H<0.1}^{iPPG}$ . This is followed by conversion of the time series data to its frequency spectrum, $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f}$ . This is followed by application of IIR bandpass filters corresponding to the frequency ranges of interest, typically associated with HR (3dB cutoffs: 0.8 to 2.2 Hz) and RR (3dB cutoffs: 0.18 to 0.5 Hz). The order for the HR filter used was 20, and the order for the RR filter used was 8. The peaks of the filtered frequency spectra $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(HR)}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{f(RR)}$ ,correspond to the HR and RR respectively as shown in Fig.3.3). To visualize the effect of the filter on the raw iPPG signal we replot it as $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{(HR)}$ and $({\mathcal{ I}}_{0<H<0.1}^{iPPG})_{(RR)}$ .

In addition the MATLAB script computes HR and RR, using iPPG obtained from the average value of the pixels in the forehead region, using Hue (without any range specifications), Green channel (Similar to the approach of Poh et al. [23]), and Saturation (HSV colorspace), for pixels with Hue within a range of 0–0.1.

SECTION IV.

Results and Discussion

A. Qualitative Comparison of HR and RR Obtained From the iPPG Signal Using the Hue Channel, With Other Algorithmic Approaches

To compare the accuracy of the different iPPG approaches, we have simultaneously measured HR and RR of 25 subjects using standard instruments like pulse oximeters (HR) and self-reporting (RR). These results are further tabulated in details in Table.2 and Table.3 (Flash on), and Table.4 and Table.5 (Flash Off).

The face video based approaches in literature show that observables designed to measure HR and RR using the Green channel outperform Red or Blue channel [23]. Table.2 and Table.3, show that the Hue channel and particularly the Hue channel within a range of 0–0.1 show excellent correspondence with the experimentally measured data as compared to the other observables including the Green channel. Also, the average Saturation as a function of time (for pixels with Hue within a range of 0–0.1), shows the least correlation with the experimental data.

B. Quantitative Comparison of Accuracy of Measurement of HR and RR Obtained From the iPPG Signal Using the Hue Channel Versus With Green Channel

We can further use inferential statistics such as linear fitting, to plot different sets of computed HR/RR with their corresponding measured values. The closer the slope of the fitted line ( $r^{2}~Linear$ ) is to 1, higher is the correlation. In the case of HR as shown in Fig.7.A. the $r^{2}~Linear$ using Hue (0–0.1) (Red line) is 0.9885, and using Green channel (Green line) is 0.5576. In the case of RR as shown in Fig.7.B. the $r^{2}~Linear$ using Hue (0–0.1) (Blue line) is 1.0386, and using Green channel (Green line) is 0.8545. This shows that HR and RR measured using iPPG obtained from Hue (0–0.1) is quantitatively better than the Green channel. In Fig. 7.C. and D. we have used the $r^{2}~Linear$ to compare the accuracy of HR and RR measured in the presence and absence of a flash illuminating the subject’s face. In the case of HR using Hue (0–0.1) as shown in Fig.7.C. the $r^{2}~Linear$ obtained using flash (Red line) is 0.9885, and in the absence of flash (Grey line) is 0.4118. This is even more distinct, in the case of RR using Hue (0–0.1) as shown in Fig.7.B. where the $r^{2}~Linear$ obtained using flash (Blue line) is 1.0386, and in the absence of flash (Gray line) is −0.0388. This shows that additional illumination can substantially increase the accuracy of measuring the HR and RR.

FIGURE 7.

Scatter plots comparing accuracy of HR and RR obtained from a iPPG using a face video, with standard approaches for measuring HR (Panel A and C) and RR (Panel B and D). Panel A shows results for HR computed using two iPPGs obtained from a single video with flash on (a) Hue (0–0.1) (Red Full) and (b) Green channel (Green Dashed), compared with HR measured using pulse oximetry. Panel B shows results for RR computed using two iPPG obtained from a single video with flash on (a) Hue (0–0.1) (Blue Full) and (b) Green channel (Green Dashed), compared with RR measured using self reporting. Panel C shows results for HR computed using two iPPG obtained from two separate videos (a) Hue (0–0.1) with flash on (Red Full) and (b) Hue (0–0.1) with flash off (Grey Dashed), compared with HR measured using pulse oximetry. Panel D shows results for RR computed using (a) Hue (0–0.1) with flash on (Blue Full) and (b) Hue (0–0.1) with flash off (Grey Dashed), compared with RR measured using self reporting. In each set of data their are 5 subgroups based on skin types, Caucasian male (Triangle Up Fill), Caucasian female (Triangle Up), African male (Triangle Down Fill), Africa female (Triangle Down) and Indian Male (Empty Box).

Show All

This could be further illustrated using the Pearson Correlation test, where $r$ equals to 1 (or −1) corresponds to a linear correlation, $r$ equals to 0 corresponds to no linear correlation. In the case of HR the Pearson’s $r$ using Hue (0–0.1) is 0.9201, and using Green channel is 0.4916. In the case of RR the Pearson’s $r$ using Hue (0–0.1) is 0.6575, and using Green channel is 0.3352. Like the scatter plots the Pearson Correlation tests show, that HR and RR measured using iPPG obtained from Hue (0–0.1) is quantitatively better than the Green channel. In the case of HR using Hue (0–0.1) the Pearson’s $r$ obtained using flash is 0.9201, and in the absence of flash is 0.3373. In the case of RR using Hue (0–0.1) the Pearson’s $r$ obtained using flash is 0.6575, and in the absence of flash is −0.07707. The Pearson Correlation tests also show that additional illumination can substantially increase the accuracy of measuring the HR and RR.

Once we have established, that the face videos illuminated with flash works better, we further analyse those results using the Bland-Altman plots as show in Fig.8. The corresponding mean of difference and standard deviation of difference (drawn as lines in Fig.8) are tabulated in Table.1. In the case of HR the standard deviation of the difference, using Hue (0–0.1) is 4.16 (as shown in Fig.8.A.), and using Green channel is 0.28 (as shown in Fig.8.B.). In the case of RR standard deviation of the difference using Hue (0–0.1) is 5.64 (as shown in Fig.8.A.), and using Green channel is 0.28 (as shown in Fig.8.B.). This further illustrates that the Hue (0–0.1) approach works better than Green for both HR and RR.

TABLE 1 Accuracy of Measuring HR and RR Using iPPG Obtained From Video With Flash on, (A) Hue Within a Range of 0–0.1 (B) Green

TABLE 2 Table Showing the HR of a Subject Obtained From Pulse Oximeter, Compared With HR Computed From a iPPG Obtained From the Forehead of the Subject (With Phone-Flash). The iPPG was Computed Using Average Values of (A) Hue Within a Range of 0–0.1 (B) Hue (C) Green and (D) Saturation (For Pixels With Hue Within a Range of 0–0.1)

TABLE 3 Table Showing the RR of a Subject Obtained From Self-Reporting, Compared With RR Computed From a iPPG Obtained From the Forehead of the Subject (With Phone-Flash). The iPPG was Computed Using Average Values of (A) Hue Within a Range of 0–0.1 (B) Hue (C) Green and (D) Saturation (For Pixels With Hue Within a Range of 0–0.1)

TABLE 4 Table Showing the HR of a Subject Obtained From the Pulse Oximeter, Compared With HR Computed From a iPPG Obtained From the Forehead of the Subject (With Phone-Flash Off). The iPPG was Computed Using Average Values of (A) Hue Within a Range of 0–0.1 (B) Hue (C) Green and (D) Saturation (For Pixels With Hue Within a Range of 0–0.1)

TABLE 5 Table Showing the RR of a Subject Obtained From Self-Reporting, Compared With RR Computed From a iPPG Obtained From the Forehead of the Subject (With Phone-Flash Off). The iPPG was Computed Using Average Values of (A) Hue Within a Range of 0–0.1 (B) Hue (C) Green and (D) Saturation (For Pixels With Hue Within a Range of 0–0.1)

FIGURE 8.

Bland-Altman plots comparing accuracy of HR and RR obtained from a iPPG using a face video, with standard approaches for measuring HR (Panel A and B) and RR (Panel C and D). Panel A shows results for HR computed using Hue (0–0.1), compared with HR measured using pulse oximetry. Panel B shows results for HR computed using Green channel, compared with HR measured using pulse oximetry. Panel C shows results for RR computed using Hue (0–0.1), compared with RR measured using self reporting. Panel D shows results for RR computed using Green Channel, compared with RR measured using self reporting. In each set of data their are 5 subgroups based on skin types, Caucasian male (Triangle Up Fill), Caucasian female (Triangle Up), African male (Triangle Down Fill), African female (Triangle Down) and Indian Male (Empty Box).

Show All

The efficacy of the Hue (0–0.1) approach over the Green channel, is further illustrated using the paired student’s t-test (as tabulated in Table.1). Where the in the case of HR the p-value, using Hue (0–0.1) is 0.8887, and using Green is 0.9068. Similarly in the case of RR the p-value, using Hue (0–0.1) is 0.2885, and using Green is 0.5608.

The standard for HR monitors as set by Advancement of Medical Instrumentation EC-13 states that, the accuracy requirements are root mean square error (RMSE) ≤ 5 BPM or ≤ 10%, whichever is greater. The RMSE values for HR using Hue (0–0.1) is 0.8887 BPM, and using Green is 0.9068 BPM. The RMSE values for RR using Hue (0–0.1) is 3.8884 BPM, and using Green is 5.6885 BPM. This clearly illustrates that the Hue (0–0.1) approach gives better results than traditional Green channel.

SECTION V.

Conclusion

In this study, we have introduced a novel noninvasive approach to measure pulse and respiratory rate from a short video of the subject’s face. Unlike traditional iPPG approaches that measures the fluctuation of a particular RGB color space, we have measured the fluctuation in the Hue channel in the HSV color space. Since this observable primarily depends on the AC component of the pulsatile blood, this observable is a more accurate and robust approach to measure vital signs using a video. In this study, we have further shown that (1) HR and RR derived from iPPG obtained using the Hue channel (range 0–0.1) gives the most co-related results with standard instruments. (2) The HR and RR derived from iPPG in obtained from videos shot with an additional flash based illumination, is qualitatively better than those obtained without the flash light. This is further demonstrated since the Pearson’s r and RMSE values obtained using Hue (0–0.1) at rest in our current work is 0.9201 and 4.1617, compared to 0.89 and 6 obtained using green channel (before post-processing) as reported by Poh et al. [23].

We have summarized our approach in the form of an flowchart as shown in Fig.9.

FIGURE 9.

Schematic representation of the process to compute iPPG using average Hue from 0 – 0.1.

Show All

However our proposed algorithms will not work in a number of real world scenarios. For example, if the forehead is partially / fully covered with hair (hairstyles such as devilock, bob cut, bettie page and beehive) or a head-gear (hat, cap, turban), or in the presence of scar tissue on forehead, and instances in which the facial detection algorithm does not detect a face due to non-traditional facial features such as presence of a heavy beard. Further studies are required to understand effects of external lighting, skin color and movement on the accuracy of the final results. In addition more accurate facial mapping technology to find the forehead region can be implemented to improve the accuracy of the face based pulse and respiratory rate detection method.

These current findings could be easily translated to a smartphone camera application to measure HR using a camera flash as an illumination source, more accurately than the current market alternatives. Smartphone applications (or APIs) coupled with such technology, will have further applications as a Software As A Medical Device (SAAMD) in the video based telemedicine market allowing an average user to monitor their HR and RR without buying additional equipment. The telemedicine market includes tele-hospital care (where the consultant doctor can dial in for monitoring patients), and tele-home care (where remote healthcare connection (initiated by the patient) with a network of clinicians is usually available 24/7 for non emergency care). The clinical relevance of telemedicine been accelerated by the advent of tele-home care platforms such as Babylon Health, MDLive, Doctor On Demand, Teladoc, and LiveHealth Online.

In addition HR measured in clinical settings using electrocardiogram (ECG), requires patients to wear chest straps with adhesive gel patches that can be both uncomfortable and abrasive for the user. HR monitored using pulse oximetry at the finger-tip or the earlobe can also be inconvenient for long-term wear. A video based software solution will be critical towards avoiding such inconveniences. This is of particular interest to neo-natal and elderly care, where contact based approaches can cause additional irritation to the subjects’ fragile skin.

In summary, we hope this will lead to development of easy to access smartphone camera based technology, for continuous monitoring of vital signs both for fitness applications as well as predicting the overall health of the user.

ACKNOWLEDGMENT

The authors report salaries, personal fees and non-financial support from their employer Think Biosolution Limited, which is active in the field of camera-based diagnostics and fitness tracking, outside the submitted work.

References is not available for this document.

Algorithms for Monitoring Heart Rate and Respiratory Rate From the Video of a User’s Face

Abstract:

Metadata

Abstract:

Introduction

Method