Applicability of deep learning for blood pressure estimation during hemodialysis based on facial images

Back ground In hemodialysis, hypotension occurs due to dehydration and solute removal. Conventional blood pressure monitoring during dialysis is intermittent and relies on staff experience and intuition to predict patient blood pressure trends based on the amount of water removed on the day and previous trends, which requires hemodialysis operations that do not lead to hypotension. Our research group has attempted to estimate blood pressure based on the spatial features of facial visible images, including information on facial color, and facial infrared images, including information on skin temperature. It is expected to realize early detection of blood pressure decrease during treat-ment if the blood pressure of dialysis patients can be estimated from their facial visible and infrared images measured continuously and remotely. In this study, we verified the applicability of deep learning algorithms in blood pressure estimation based on facial visible and infrared images of hemodialysis patients. Methods Measured facial visible and infrared images and mean blood pressure (MBP) of hemodialysis patients were applied to a convolutional neural network to construct an MBP estimation model based on the spatial features of the facial images. Results Average blood pressure could be estimated with an error of less than 20 mmHg based on the spatial features of the facial images, and the blood pressure estimation accuracy based on the spatial features of the facial infrared images was higher than that of the facial visible images. Conclusion We found the possibility of applying the deep learning algorithm to blood pressure estimation based on the spatial features of facial images. Trial registration This study is not subject to enrollment in a clinical trial due to the absence of both intervention and invasion. The Ethics Review Committee of Jichi Medical University has approved the same interpretation.


Introduction
Currently, there are about 350,000 patients treated with hemodialysis [1], and hypotension occurs in about 40% of them [2].Dialysis-related hypotension can be divided into orthostatic hypotension, chronic sustained hypotension, and intradialytic hypotension [3], and intradialytic hypotension is a risk factor for cardiovascular complications [4].
Hypotension during hemodialysis results from dialysis maneuvers such as water removal, vasodilation due to components of the dialysate solution, and osmotic hypotension due to solute removal [5].In hemodialysis, the need for water removal and solute removal in a limited amount of time causes rapid fluctuations in circulating blood volume, and the refilling of fluid components from the interstitium to the blood vessels does not occur in time, resulting in hypotension.
Blood pressure is the most important monitoring item for hemodialysis patients.In most cases, it is measured at regular intervals using a non-observational method, with a manchette wrapped around the upper arm of the limb opposite the vascular access.In the past, staff used auscultation and other methods of measurement.In recent years, automatic blood pressure monitors built into the console have become the norm, and staff labor is being saved by measuring at set time intervals, while the manchette is wrapped around the patient's arm.Some facilities also use dialysis support systems for centralized management.However, blood pressure must be monitored intermittently in both cases, and the staff must rely on their experience and intuition to predict the patient's blood pressure trend based on the amount of water removed on the day and past trends, and to operate hemodialysis without causing hypotension.
In a previous study, we attempted to construct a model capable of predicting blood pressure drops during treatment by applying machine learning to intermittently measured maximum and minimum blood pressure and pulse pressure information [6].The ideal way to monitor blood pressure is continuously, and if signs of change can be detected as early as possible, changes in dialysis conditions, such as the rate of water removal, can be made at an earlier time.Remote measurement is also desirable in terms of infection control.
On the other hand, researches on remote vital sign sensing have been performed.One of the previous studies on remote blood pressure measurement estimated blood pressure from the temporal information of remotephotoplethysmography (PPG) signals measured from facial video [7].The problem with the method is that it takes some time to estimate blood pressure since it requires temporal information of the PPG signals.On the other hand, our research group has attempted to estimate blood pressure based on spatial features of facial visible and infrared images [8][9][10].Facial visible images contain information on facial color, which fluctuates with facial skin blood flow.The facial visible images can be regarded as a short-term indicator of hemodynamics that can be measured remotely since facial color fluctuates rapidly with changes in cutaneous blood flow.On the other hand, facial infrared images contain information on skin temperature, which fluctuates with facial cutaneous blood flow.Facial infrared images can be regarded as a remotely measurable long-term indicator of hemodynamics since skin temperature fluctuates slowly with changes in cutaneous blood flow.We hypothesized that changes in blood pressure would cause changes in the dynamics of facial skin blood flow, which in turn would cause changes in the spatial patterns of facial color and facial skin temperature, and attempted to estimate blood pressure based on the spatial patterns, or spatial characteristics, of the facial visible and infrared images.We have been able to estimate blood pressure with an error of approximately 10 mmHg or less based on facial visible and infrared images measured in controlled environments such as room temperature and experimental environments.The merit of the blood pressure estimation technology based on the spatial features of the facial visible and infrared images is not only that blood pressure can be estimated remotely, but also that blood pressure can be estimated with only one facial image.At this time, blood pressure estimation based on facial images in healthy subjects has been studied, but not yet in patients.If the facial visible and infrared images of a dialysis patient can be measured continuously and remotely, and the patient's blood pressure can be estimated from these facial images, early detection of hypotension during hemodialysis can be expected to be realized.
In this study, we verified the applicability of deep learning algorithms in blood pressure estimation based on facial visible and infrared images of hemodialysis patients.

Materials and methods
The approach of this study is shown in Fig. 1.The blood pressure estimation model based on the spatial features of facial images was constructed using facial visible images, facial infrared images, and MBP of hemodialysis patients and applied to a convolutional neural network, which is one of the deep learning algorithms.We evaluated the accuracy of the model using the root-mean-square error (RMSE) and correlation coefficient between estimated and measured MBP.The present study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Review Committee of Jichi Medical University (Approval No.: 23-107).Consent for the experiment was obtained in advance from the hemodialysis patients participating in the experiment.

Experimental systems
The experimental system is shown in Fig. 2. The experimental system consists of a visible camera (C920, Logitech, Co.) and an infrared thermography camera (A35, FLIR Systems, Inc.) for measuring facial visible and infrared images, a dialysis machine (DBB-100NX, Nikkiso, Co.), and a manchette for measuring the blood pressure of the subjects (hemodialysis patients).The visible and infrared thermography cameras were placed 50 cm above the subject's head.The resolution of the visible and infrared thermography cameras was 1920 × 1080 pixels and 320 × 256 pixels, respectively.The emissivity of the skin is 0.98.

Experimental protocol
The experiment was conducted in a clinic where the room temperature was relatively stable.During the experiment, subjects were asked to lie on their backs on the bed and to free themselves without forcing them to maintain their posture.During the experiment, visible and infrared images containing the subject's face were measured at 30 fps and 1 fps, respectively, and MBP was manually measured approximately once every 15 min using a dialysis machine.Three hemodialysis patients (hereinafter defined as Subject A-C) participated in the experiment, and each subject participated in the experiment Fig. 1 Approach of this study Fig. 2 Experimental system multiple times.The three subjects were male.As for the age of each subject at the beginning of the experiment, Subject A was 56 years old, Subject B was 83 years old, and Subject C was 62 years old.The duration of hemodialysis of each subject was 9 years for Subject A, 5 years for Subject B, and 1 year for Subject C. All subjects were non-diabetic mellitus and hypertensive.The number of experiments is 3 for Subject A, 5 for Subject B, and 2 for Subject C. Dialysis conditions vary among subjects.Even in the same subject, dialysis conditions may vary from day to day on dialysis.The information that affects blood pressure during dialysis is ultrafiltration rate, dialysate flow rate, and quantity blood flow (QB).Dialysate flow rate in each subject and experiment was 500 ml/min.The range of ultrafiltration rate and maximum QB in each subject and experiment are shown in Table 1.

Creation of facial visible and infrared images
The facial visible and infrared images used to construct the blood pressure estimation model were any 30 of the facial visible and infrared images measured in the 5 min before and after the blood pressure was measured.There were times when subjects did not turn their faces toward the camera or their faces were hidden by their hands or other objects during blood pressure measurement, so the data measured at those times were excluded from the analysis.
The method of generating the facial visible and infrared images was as follows (see Fig. 3).The face area was detected by applying the Single Shot Multibox Detector, which is one of the object detection algorithms, to the visible and infrared images containing the subject's face.A total of 68 facial feature points were extracted by applying a point distribution model to the detected facial regions [11].Spatially standardized facial images were generated by applying an affine transformation based on the extracted facial feature point coordinates and template coordinates [12].The aim of spatially standardizing facial images is to reduce the effects of individual differences in shape and orientation of face.The spatially standardized images are hereinafter referred to as the facial visible and infrared images.The size of the facial visible and infrared images was 201 × 201 pixels.
In particular, the facial visible images contain information such as the eyes and nasal bridge, and the luminance value of the images may fluctuate due to facial expressions and blinking of the eyes.To reduce these effects, a median filter with a kernel size of 20 × 20 pixels was applied to the facial visible images to smooth the image.

Creation of a model for estimating blood pressure
A model for estimating blood pressure based on the spatial features of the facial images was created by training a convolutional neural network (CNN) on the facial visible and infrared images.The configuration of the CNN is shown in Table 2.In the table, "Input" is the input layer, "Conv n" is the nth convolutional layer, "Pool n" is the nth mean pooling layer, "Batch Norm n" is the nth batch normalization layer, "ReLU" is the layer to which the activation function, Rectified Linear Unit, is applied, "Dropout" is the Dropout layer to prevent over learning, "FC" is the total combined layer, "Reg" is the regression output layer, "Size" is the size of the input layer, convolution layer filters, and average pooling, and "Number" is the number of convolution layer filters.The average pooling stride and dropout rate were set to 2 and 0.2, respectively.On the other hand, for the training conditions of the CNN, the batch size was set to 16, the number of epochs to 100, and the initial learning rate to 0.001.In this study, the model is constructed from the data measured in one experiment.Based on the number of experiments for each subject, the number of models to be constructed is 3 for Subject A, 5 for Subject B, and 2 for Subject C. In addition, for each subject, a model trained only on facial visible images and a model trained only on facial infrared images were constructed for each subject.To evaluate the generalization performance of these models, K cross-validation was performed.The number of blood pressure measurements and cross-validation are shown in Table 3. "Sub.A_1st" in the table indicates the first experiment of Subject A. The number of crossvalidations was less than the number of blood pressure measurements due to the exclusion of some data from the analysis since the subjects' faces were hidden during the blood pressure measurements.In the cross-validation, facial images and average blood pressure acquired during one blood pressure measurement were used as test data, and all facial images and average blood pressure acquired during other blood pressure measurements were used as training data, and the training and test data were interchanged.Root-mean-square error (RMSE) and correlation coefficient (r) between estimated and measured MBP obtained from all cross-validations were Table 2 Configuration of the CNN "Input" is the input layer, "Conv n" is the nth convolutional layer, "Pool n" is the nth mean pooling layer, "Batch Norm n" is the nth batch normalization layer, "ReLU" is the layer to which the activation function, Rectified Linear Unit, is applied, "Dropout" is the Dropout layer to prevent over learning, "FC" is the total combined layer, "Reg" is the regression output layer, "Size" is the size of the input layer, convolution layer filters, and average pooling, and "Number" is the number of convolution layer filters.The average pooling stride and dropout rate were set to 2 and 0. Table 3 The number of blood pressure measurements and cross-validation "Sub.A_1st" in the table indicates the first experiment of Subject A. The number of cross-validations was less than the number of blood pressure measurements due to the exclusion of some data from the analysis since the subjects' faces were hidden during the blood pressure measurements calculated as a rating of the generalization performance of the model.In addition, Grad-CAM was applied to analyze the spatial features of the facial images that contributed to an estimation of MBP.Grad-CAM is a method for visualizing which parts of an image are the basis for an inference in an image recognition task.High feature values in Grad-CAM indicate a high likelihood of being the basis for an inference.

Sub
MATLAB 2021a (MathWorks, Inc.) was used to train the CNN and apply Grad-CAM.

Results and discussion
Figure 4 shows the time-series variation of measured MBP and ultrafiltration rate in the fifth experiment of Subject B (Sub.B_5th), where a marked decrease in blood pressure was observed.Ultrafiltration rate remained constant except in the early stages of most experiments, and a decrease in blood pressure was observed in all subjects.
Table 4 shows the RMSE and correlation coefficient (r) between MBP estimated based on the spatial features of facial visible and infrared images and measured MBP.As a result, the blood pressure estimation accuracy based on the spatial features of facial infrared images was higher than that of facial visible images.On the other hand, the accuracy of estimating blood pressure varied among subjects.Scatter plots of MBP estimated based on the spatial features of the facial infrared images and measured MBP in the third experiment of Subject B (Sub.B_3rd), the condition in which blood pressure estimation accuracy was high, are shown in Fig. 5. Scatter plots of MBP estimated based on the spatial features of the facial infrared images and measured MBP measured in the first experiment of   Subject A (Sub.A_1st), the condition in which blood pressure estimation accuracy was low, are shown in Fig. 6.
In a previous study, blood pressure estimation based on facial images of healthy subjects was performed [8][9][10].
As a result, it was confirmed that blood pressure could be estimated with an error of approximately 10 mmHg.The accuracy of blood pressure estimation based on facial images of hemodialysis patients in this study was comparable to the accuracy of blood pressure estimation based on facial images of healthy subjects.Next, we analyzed the spatial features that contributed to blood pressure estimation.The facial infrared image measured in the third experiment of Subject B (Sub.B_3rd) and the Grad-CAM color map obtained from the facial infrared image are shown in Fig. 7.The facial visible image measured in the first experiment of Subject A (Sub.A_1st) and the Grad-CAM color map obtained from the facial visible image are shown in Fig. 8.The results in Fig. 7 show that the feature values in the nose region are higher, indicating that the skin temperature variation in the nose region particularly contributes to the estimation of MBP.It is known that skin temperature fluctuates with skin blood flow.In particular, the nasal region has many arteriovenous anastomoses (AVAs), which are larger in diameter than normal capillaries [13].Skin temperature changes significantly with skin blood flow fluctuations [14].It is considered that the skin temperature in the nasal region fluctuated due to blood pressure changes associated with rapid changes in circulating blood volume during hemodialysis.On the other hand, the feature values around the eyes were higher than the results in Fig. 8. From the visible images, we confirmed that the subject was blinking during the experiment.It is considered that the feature values expressed were caused by blinking eyes, not by fluctuations in facial color associated with blood pressure fluctuations, and that the accuracy of blood pressure estimation based on those feature values was low.

Limitation
There are three major limitations to this study.
The first limitation is the small number of subjects.Since the purpose of this study was to verify the applicability of the deep learning algorithm in blood pressure estimation based on visible and infrared facial images of dialysis patients, the number of subjects was limited to three.In application, it is essential to examine the possibility of increasing the number of subjects.
The second limitation is that this study did not investigate the preprocessing of facial images.From the results of this study, we found the possibility of applying deep learning algorithms in blood pressure estimation based on spatial features of facial images.However, blood pressure estimation accuracy was low, especially in blood pressure estimation based on facial visible images, due to the extraction of spatial features such as eye blinking which is not related to blood pressure fluctuations.This result indicates the need for preprocessing of facial images.Therefore, the investigation of the preprocessing of facial images will be performed in the future study.
The third limitation is that the structure and learning conditions of the CNN were fixed in this study.Optimization of the structure and learning conditions of the CNN is expected to improve the extraction of spatial features related to blood pressure variation and the accuracy of blood pressure estimation.Therefore, optimization of the CNN structure and learning conditions should be considered in the future study.

Conclusion
In this study, we verified the applicability of deep learning algorithms in blood pressure estimation based on facial visible and infrared images of dialysis patients.The results showed that average blood pressure could be estimated with an error of less than 20 mmHg based on the spatial features obtained by applying CNN, which is one of the deep learning algorithms, to the facial images.Furthermore, the blood pressure estimation accuracy based on the spatial features of the facial infrared images was higher than that based on the spatial features of the facial visible images.From the above, we found the possibility of applying the deep learning algorithm to blood pressure estimation based on the spatial features of facial images.However, the blood pressure estimation accuracy was low, especially in the blood pressure estimation based on facial visible images, due to the extraction of spatial features such as blinking eyes that are not related to blood pressure fluctuation.In the future, it will be necessary to examine the possibility of increasing the number of subjects, image preprocessing, and optimization of the deep learning algorithm.

Fig. 4
Fig. 4 Time-series variation of measured MBP and ultrafiltration rate in the fifth experiment of Subject B (Sub.B_5th)

Fig. 5
Fig. 5 Scatter plots of estimated and measured MBP in the third experiment of Subject B (Sub.B_3rd)

Fig. 6 Fig. 7
Fig. 6 Scatter plots of estimated and measured MBP in the first experiment of Subject A (Sub.A_1st)

Fig. 8
Fig. 8 Facial visible image and Grad-CAM color map in the first experiment of Subject A (Sub.A_1st)

Table 1
The range of ultrafiltration rate and maximum QB in each subject and experiment "Sub.A_1st" in the table indicates the first experiment of Subject A Fig. 3 Method of generating visible and infrared images 2, respectively

Table 4
RMSE and correlation coefficient (r) between estimated and measured MBP"Sub.A_1st" in the table indicates the first experiment of Subject A