Loading [MathJax]/extensions/MathZoom.js
Sensor Network Oriented Human Motion Segmentation With Motion Change Measurement | IEEE Journals & Magazine | IEEE Xplore

Sensor Network Oriented Human Motion Segmentation With Motion Change Measurement


Framework for Human Motion Sequence Segmentation based on Hashing in Sensor Network Environment.

Abstract:

Smart Internet of Things has greatly improved the quality of human life with increasingly intelligent sensor networks. Efficient and accurate human motion time series seg...Show More
Topic: Intelligent Systems for the Internet of Things

Abstract:

Smart Internet of Things has greatly improved the quality of human life with increasingly intelligent sensor networks. Efficient and accurate human motion time series segmentation is the key issue in human motion analysis and understanding. To realize human motion sequence segmentation, a comprehensive human motion description and an intelligent segmentation algorithm are required. Hence, this paper proposes a sensor network-based human motion sequence segmentation framework. With the facilitation of sensor network and sensor network-based feature fusion method, human motions can be comprehensively described. Based on the comprehensive description of motion data, a new motion change variation-based segmentation method is proposed to realize human motion sequence segmentation. Moreover, to satisfy the time efficiency demand in the applications of large scale sensor networks, a hashing algorithm is introduced to compress the original captured sensor data, which can effectively represent the human motions with short binary codes and facilitate the motion change measurement. Experiments on real-world human motion data sets validate the effectiveness of our proposed sensor network-based human motion sequence segmentation framework compared with other state-of-the-art human motion segmentation methods.
Topic: Intelligent Systems for the Internet of Things
Framework for Human Motion Sequence Segmentation based on Hashing in Sensor Network Environment.
Published in: IEEE Access ( Volume: 6)
Page(s): 9281 - 9291
Date of Publication: 27 December 2017
Electronic ISSN: 2169-3536

Funding Agency:

Citations are not available for this document.

SECTION I.

Introduction

With the enormous development in the field of Internet of Things (IoT), smarter IoT systems have greatly promoted the development of society and raised the living standards of mankind [1]–​[4], [8]. Different IoT systems have been applied to many walks of life [5]–​[7]. In the fields of athletic training, medical diagnostics, and security monitoring, etc, smart IoT systems are utilized to analyze human motion time series. Based on the analysis, motion information which is of great significance to improve the quality of human life can be obtained [8]–​[10]. For instance, in the process of medical diagnostics for heart diseases, monitoring the condition of heartbeat, breath, and motions through sensor networks can effectively facilitate the evaluation of the illness. Human activities can be effectively analyzed and deep mined through the facilitation of smart IoT systems, especially sensor networks [8], [12], [13].

Analysis on human motions by sensor networks not only focuses on the relations of different sensors, but also on the relations of the states of sensor networks in different moments [10], [11], [14]. The analysis on human motions and the derived technologies have improved the intelligence of the IoT [15], [16]. The fundamental issue in human motion research is Human Motion Sequence Segmentation (HMSS). The segmentation results significantly influence the analysis of human motion sequence. Hence, analyzing the time series generated by sensor networks and segmenting the human motion sequence are key in human motion analysis and understanding. HMSS has attracted more and more attention in research fields [17], [19], [22].

Various types of pattern recognition methods have been proposed to realize HMSS. For instance, clustering methods are employed by researchers to realize HMSS [18], [19], which are based on correlations of motions in the same class. Aligned Cluster Analysis (ACA) combined kernel k-means with the generalized dynamic time alignment to realize HMSS by clustering the time series data [18]. Moreover, an unsupervised hierarchical bottom-up framework is adopted to embed the the low-dimensional human motion sequences. The bottom-up framework facilitates the efficiency of the clustering based HMSS methods [19]. However, clustering-based motion sequence segmentation methods require that the intra-cluster variance of time series should be low, which is quite difficult for real world applications. Furthermore, if the initial number of clusters is not perfectly set, the segmentation performance will be limited. To solve these problems, neighborhood similarity information of the time series is employed to construct a similarity graph to enhance the clustering performance [20]. Apart from clustering methods, classification methods are also considered in HMSS, which distinguish motion classes from each other [21]. Data Point Classification (DPC) labels data points as either segment points or non-segment points by using an online classifier. Based on the online classifier, data points labeled with segment points are detected as transition frames of the human motion time series. Inspired by DPC, detecting transition frame methods are adopted in HMSS. Kernelized Temporal Cut (KTC) is proposed to cut structured sequential data into different regimes sequentially [22]. KTC can detect the transition frames as well as repetitive frames in a human motion time series simultaneously.

Apart from the aforementioned segmentation methods, dimensional reduction methods are also adopted in HMSS process [17], [23]. For example, the Principle Component Analysis (PCA) [24] approach to human motion segmentation employs the intrinsic dimension of human motion to realize segmentation, and assigns a cut when the intrinsic dimension of a local neighborhood in the motion time series suddenly increases [23]. Locality Linear Projections (LPP) approach to HMSS utilizes the local neighborhood structure to detect segment points [17], [25]. The Probability Principle Component Analysis (PPCA) [26] to human motion segmentation analyzes the distributions of motions of the time series to detect the segmentation points. That is, the PPCA approach assumes that frames from the same class should obey the same probability distribution, and places a cut when the distribution of human motions changes [23]. Unlike the PCA and PPCA approaches which only focus on the data, a physical driven motion segmentation method is proposed to realize human motion segmentation [17]. Time series-Warp Metric Curvature Segmentation (TS-WMCS) constructs a curvature-like descriptor to evaluate the changes in human motions. TS-WMCS utilizes the physical property that motions from the same class are alike in a given time series, and detects the transition points when the local curvature of a frame changes greatly [17].

In general, the key issue in HMSS is to detect the segmentation points in the sequences. Note that, human motions will change significantly near the segmentation points [17], [23]. Based on this fact, this paper proposes a sensor network based segmentation (SNBS) method for HMSS. With the facilitation of sensor networks, human motions can be effectively recorded and captured from different perspectives. Moreover, a motion variation description method is proposed to depict changes of human motions in a given time series, and the motion of each frame is regarded as a bucket in the hashing table. Based on the physical properties of human motions [17], [23], movements from the same class will not change dramatically in the time series. However, due to the appearance of motions from new classes, variations in a motion sequence will significantly increase near the segmentation points. Moreover, to promote the efficiency of SNBS, hashing method [33] is employed to describe the bucket of each frame in hamming space. With the facilitation of hashing methods, data similarities are represented effectively.

The main contributions of this paper can be summarized as follows.

  • Firstly, propose a data fusion method to organically combine the data collected from different kinds of sensors.

  • Secondly, employ a hashing method to represent the original sensor network data, which projects original data to hamming space to facilitate the description of motion change.

  • Thirdly, realize HMSS from the perspective of motion change degree.

The remainder of this paper is organized as follows: In section II, mainstream HMSS methods are introduced. In section III, the SNBS method for HMSS will be detailedly presented. Experimental results are provided in section IV, which is followed with the conclusion and future work in section V.

SECTION II.

Related Works

This section mainly introduces the mainstream HMSS methods. Generally, HMSS can be divided as data driven methods and physical driven methods [17]. In section II-A, the PCA and locality preserving projections (LPP) approaches to HMSS will be introduced, which utilizes the low dimensional embedding of original data to realize HMSS. In section II-B, the PPCA approach to HMSS will be introduced, which evaluates motion changes through the motion distributions. Unlike data driven methods (PCA, LPP and PPCA approaches to HMSS), physical driven methods pay more attention to the physical meaning of the human motions and realize HMSS by researching the motion change process. As a typical physical driven segmentation method, TS-WMCS algorithm will be introduced in section II-C.

A. PCA and LPP Approaches to HMSS

Both the PCA and LPP approaches to HMSS focus on the data of the human motions in each frame, this section will briefly introduce the PCA and LPP approaches. Given human motion sequence $X = {\left [{ {x_{1}, \cdots,{x_{n}}} }\right]^{T}} \in {R^{n \times D}}$ with $n$ frames and $D$ dimensions. For each frame ${x_{i}}$ , utilizing $k-NN$ method to construct the neighborhood ${X_{i}}$ for ${x_{i}}$ , where ${X_{i}} = \left \{{ {x{}_{1}, \cdots,{x_{j}}, \cdots {x_{k}}} }\right \} \in {\mathbb {R}^{k \times D}}$ and $j \in {N_{i}}$ . Therein, $k$ is the size of the neighborhood and ${N_{i}}$ is the collection of the indices of ${X_{i}}$ .

The PCA approach to HMSS assumes that the error between neighborhood ${X_{i}}$ and it’s projection ${X'_{i}}$ will not vary largely for motions from the same class. However, the error will increase largely for the frames near the transition clips [27], [28]. To evaluate the error $e_{i}$ in each frame, original data ${X_{i}}$ is reduced to its intrinsic dimension [23]. The error $e_{i}$ between ${X_{i}}$ and ${X'_{i}}$ can be expressed as Eq.(1).\begin{equation} e_{i} = \sum \limits _{i = 1}^{n} {||{X_{i}} - X'_{i}|{|^{2}}} \end{equation} View SourceRight-click on figure for MathML and additional features.

The PCA approach to HMSS works well if motion subsequences of different classes have clear transition clips and motions from the same class will not change significantly. Nevertheless, since the complexity of real world human motions and the continuity of real world movements, the error between ${X_{i}}$ and ${X'_{i}}$ will not change significantly in time series. Under this circumstance, the PCA approach to HMSS can hardly detect all the transition clips in the human motion time series [17].

Inspired by PCA approach to HMSS, LPP can also be applied in HMSS [25]. Like PCA approach, LPP utilizes a linear projection to map original data to low dimensional embedding as well. However, LPP aims at preserving local similarities in the projection process. The similarity between sample $x_{i}$ and $x_{j}$ can be represented by ${w_{ij}} = {e^{\frac {{\left \|{ {x_{i} - {x_{j}}} }\right \|_{2}^{2}}}{\sigma }}}$ ($\sigma =1$ in our experiment), where ${e^{\left ({\cdot }\right)}}$ is the exponential funciton.

LPP approach to HMSS also utilizes an error $e_{i}$ of each frame to evaluate the motion change degree [17]. The error can be calculated by Eq.(2) \begin{equation} {e_{i}} = \sum \limits _{j \in {N_{i}}} {{w_{ij}}{{\left ({{y_{i} - {y_{j}}} }\right)}^{2}}} \end{equation} View SourceRight-click on figure for MathML and additional features. where $y_{i}$ and $y_{j}$ represent the low dimensional embedding of $x_{i}$ and $x_{j}$ , $N_{i}$ denotes the adjacent frames of the $ith$ frame.

LPP utilizes the local structure on human motion sequence to detect transition clips. However, since the complexity of real-world motions, LPP approach to HMSS can not effectively detect the motion changes in human motion time series [17].

B. PPCA Approach to HMSS

The PPCA approach to HMSS utilizes the probability principle component analysis method to evaluate the distribution of motions in time series. The PPCA approach to HMSS utilizes the Gaussian distribution to extend the traditional PCA approach, and represents the relations of different motions with a correlation covariance matrix $C$ . The correlation covariance matrix $C$ can be utilized to compute the average Mahalanobis distance $H$ of the whole neighborhood of the $ith$ frame, which can evaluate how likely are motions from $(i+1)th$ frame to $(i+k)th$ frame belong to the Gaussian distribution [23]. The calculation of $H$ can be expressed as Eq. (3).\begin{equation} H = \frac {1}{T}\sum \limits _{j = i + 1}^{i + k} {{{({x_{j}} - \overline x)}^{T}}{C^{ - 1}}({x_{j}} - \overline x).} \end{equation} View SourceRight-click on figure for MathML and additional features.

Thereinto, $i$ is the index of the start frame of the neighborhood ${X_{i}}$ . Unlike the PCA approach to HMSS, the PPCA approach to HMSS simply selects the next $k$ motions for ${x_{i}}$ to construct neighborhood instead of adopting $K-NN$ method. The PPCA approach to HMSS can effectively capture the correlation and variance of different joint angles respectively [23]. However, the Gaussian distribution assumption for the human motions is too strong for real world applications (Especially for complex human motions, such as Tai Chi), which will lower the segmentation accuracy of PPCA segmentation approach. Besides, Both the PCA and PPCA human motion sequence segmentation methods are based on angles, which will cause segmentation accuracy decrease when they are extended to real world applications [17].

C. Time Series-Warp Metric Curvature Segmentation (TS-WMCS) Algorithm

This section mainly introduces the Time series-Warp Metric Curvature Segmentation (TS-WMCS) algorithm to HMSS [17]. TS-WMCS utilizes a curvature-like descriptor to depict the the changes of human motions in time series. For each frame ${x_{i}} \in X $ , construct the neighborhood ${X_{i}}$ for ${x_{i}}$ with $k-NN$ method like the PCA approach to HMSS. Then, construct the angle between the samples and the tangent space in each neighborhood. The warp degree of the data can be described by the angle between ${x_{j}}$ and its orthogonal projection, where ${x_{j}}$ is the neighbor of ${x_{i}}$ . Therein, ${\alpha _{ij}} = {\alpha _{j}}\left ({{Q_{i}} }\right)$ , and ${\alpha _{ij}} \in \left [{ {0,\pi /2} }\right]$ . The local low dimensional space can be expanded by ${Q_{i}}$ , and ${Q_{i}}$ can be obtained by the optimization equation Eq.(4), \begin{equation} \mathop {\arg \max }\limits _{Q_{i}} Tr\left ({{Q_{i}^{T}{X_{i}}ZX_{i}^{T}{Q_{i}}} }\right)\quad s.t.~Q_{i}^{T}{Q_{i}} = I \end{equation} View SourceRight-click on figure for MathML and additional features. where $Z$ is the normalization matrix of ${X_{i}X_{i}^{T}}$ . Then, the curvature-like descriptor can be expressed as Eq.(5).\begin{equation} {c_{i}} = {{\sum \limits _{j \in {N_{i}}} {\cos {\alpha _{ij}}} } \left /{ {\sum \limits _{j \in {N_{i}}} {\left \|{ {x_{j}} }\right \|} }}\right.} \end{equation} View SourceRight-click on figure for MathML and additional features.

The transition points in human motion sequences are those frames whose curvature-like descriptors are high. Apart from the curvature-like descriptor segmentation, TS-WMCS algorithm also reduces the dimension of original motion sequences. The low dimensional temporal feature curves are utilized to assist the segmentation. For each neighborhood, compute the low dimensional embedding ${{\Theta _{k_{i}}}}$ of $X_{i}$ , and then add affinity projection $L_{i}$ for each ${{\Theta _{k_{i}}}}$ of $X_{i}$ to map the low dimensional embedding to a global embedding $T = [{\tau _{1}},{\tau _{2}}, \cdots,{\tau _{n}}]^{T} \in {R^{n \times d}}$ . Hence, the low dimensional embedding $T$ can be obtained by Eq.(6).\begin{equation} \min \limits _{\tau _{i},{L_{i}}} \sum \limits _{i = 1}^{n} {c_{i}{{\left \|{ {\left ({{{T_{k_{i}}} - {\tau _{i}}e_{k_{i}}^{T}} }\right) - {L_{i}}{\Theta _{k_{i}}}} }\right \|}^{2}}} \quad s.t.~ T{T^{T}} = {I_{d}} \end{equation} View SourceRight-click on figure for MathML and additional features. where $I_{d}$ is a $d-by-d$ identity matrix, and ${e_{k_{i}}}$ is a vector with all ones.

TS-WMCS algorithm utilizes the curvature-like descriptor to realize HMSS, which can effectively depict the changes of human motion sequences. However, a certain move in complex motion sequences will contain motions which are quite different from each other (e.g. “white crane spreads its wing” in Tai Chi), TS-WMCS algorithm can not yield promising result on these kind of motion sequences [17]. Moreover, TS-WMCS algorithm employs the low dimensional temporal feature curves to assist the segmentation, which is lack of clear physical significance.

SECTION III.

Sensor Network Based Segmentation for HMSS

In this section, we will introduce the proposed sensor network based segmentation (SNBS) approach for HMSS. SNBS utilizes hamming distance to describe the change degree of motions in time series, which is based on the binary representations of the motion data collected by sensor network. The sketch of SNBS is summarized as Fig. 2. In real-world applications, a sensor network may utilize different kinds of sensors to collect the motion data [29], [30]. Hence a feature fusion method is proposed to combine the data collected by different sensors (detailed introduction is in section III-A). Note that, motions of the same class change slightly while motions of different classes change significantly [17], [23]. Hence, we construct the local change degree to fully reveal the change of motions (details are in section III-C). To accurately describe the motion changes in time series, the binary representations should represent the similarities and differences of motions simultaneously. Considering the property of hashing methods in preserving data similarities with binary codes (hash buckets), we utilize hashing method to represent the motions. Here, we employ the widely utilized hashing method, Iterative Quantization Hashing (ITQ) [33], to represent the motions. Detailed introduction of ITQ is in section III-B.

FIGURE 2. - The sketch of SNBS in constructing the motion change measurement.
FIGURE 2.

The sketch of SNBS in constructing the motion change measurement.

A. Feature Fusion for Multi-Source Data Collected by Multi-Sensors

Segmentation for human motion sequence not only requires effective segmentation methods but also requires comprehensive description of the human motions. In real-world applications, sensor network is always adopted to record the human motions [29], [30]. However, different kinds of sensors may adopted by a sensor network to fully describe the poses of human motions [23], [35], which causes the problem of representing human motions with features of different scales. To solve this problem, a Sensor Network Based Feature Fusion (SNBFF) method is proposed in this paper to align the multi-source data collected by different kinds of sensors.

Different kinds of sensors reflect human motions from different perspectives. To effectively utilize the data obtained from different kinds of sensors, we employ a graph based feature fusion method named Feature Graph Fusion (FGF) to combine the data collected by different sensors [34]. To clearly clarify the process of the feature fusion method, we assume that three kinds of sensors are adopted in a sensor network (motion sensors, depth sensors and visual sensors). SNBFF can organically combine the collected data, and help describing the motion of each frame with a single descriptor.

We employ the graph fusion method in [34] to combine data collected by the aforementioned three kinds of sensors. For each kind of sensors, Jaccard coefficient [37] is adopted to measure the similarities of the collected data. Based on the Jaccard coefficient, the similarity graph of the data collected by each kind of sensors can be constructed. The similarity graph of motion movement data can be represented as ${G^{m}} = \left ({{V^{m},{E^{m}},{w^{m}}} }\right)$ , the similarity graph of motion depth data can be denoted by ${G^{d}} = \left ({{V^{d},{E^{d}},{w^{d}}} }\right)$ , and the similarity graph of motion data collected by visual sensors can be defined as ${G^{c}} = \left ({{V^{c},{E^{c}},{w^{c}}} }\right)$ . All the similarity graphs are undirected graphs. Therein, ${V^{m}}$ , ${V^{d}}$ and ${V^{c}}$ denote the frames of the motion sequence, $E$ is the edge constructed by Jaccard similarity coefficient for $V$ , and $w$ represents the weight of $E$ . The fused similarity graph $G = \left ({{V,E,w} }\right)$ can be obtained by fusing ${G^{m}}$ ,${G^{d}}$ and ${G^{c}}$ with the following constraints. First, $V = {V^{m}} \cup {V^{d}} \cup {V^{c}}$ . Second, $E= {E^{m}} \cup {E^{d}} \cup {E^{c}}$ . Third, $w= {w^{m}} \cup {w^{d}} \cup {w^{c}}$ . Hence, the similarity weight of the fused graph can be represented as $W = \left [{ {W_{ij} } }\right]$ , where $i,j \in V$ are two motions of the motion sequence. Then, we can get the normalized similarity description matrix $W = \left [{ {W_{ij} } }\right]$ , where $i,j \in V$ . $W_{ij} $ can be normalized by Gaussian kernel as Eq.(7). \begin{equation} W_{ij} = \begin{cases} {\dfrac {\exp \left ({{ - \frac {W_{ij}}{2\sigma _{i} ^{2}}} }\right)}{\sum \nolimits _{j' \in N_{k} \left ({i }\right)} {\exp \left ({{ - \frac {W_{i{j}'}}{2\sigma _{i} ^{2}}} }\right)} },} & {j \in N_{k} \left ({i }\right)} \\ {0,} & {else} \\ \end{cases} \end{equation} View SourceRight-click on figure for MathML and additional features. where $\sigma _{i} $ is the bandwidth parameter of Gaussian kernel [38], $j$ is the indices of the neighborhood frame of the $ith$ frame. In this paper, $\sigma _{i} $ is assigned by the variance of similarities of the $ith$ frame.

After obtaining the similarity weight $W = \left [{ {W_{ij} } }\right]$ of the fused graph. The fused motion feature can be obtained by Eq.(8).\begin{equation} \max \limits _{F} \prod \nolimits _{i,j \in V} {\left ({{\frac {\exp \left ({{f_{i}^{T} f_{j} } }\right)}{\sum \nolimits _{j \in V} {\exp \left ({{f_{i}^{T} f_{j} } }\right)} }} }\right)} ^{W_{ij} } \end{equation} View SourceRight-click on figure for MathML and additional features. where $\{{f_{1}},{f_{2}}, \cdots, {f_{n}}\} \in {\mathbb {R}}^{D\times 1} $ , ${f_{i}}$ is the fused feature of the $ith$ motion in the motion sequence. Eq.(8) can be expressed as the form of log function as Eq.(9). \begin{align} \ell=&\sum \nolimits _{i,j \in V} {{W_{ij}} \cdot \log \left ({{\frac {{\exp \left ({{f_{i}^{T}{f_{j}}} }\right)}}{{\sum \nolimits _{j \in V} {\exp \left ({{f_{i}^{T}{f_{j}}} }\right)} }}} }\right)} \notag \\\approx&\frac {{\sum \nolimits _{i \in V} {\sum \nolimits _{j \in s\left ({i }\right)} {\log \left ({{\frac {{\exp \left ({{f_{i}^{T}{f_{j}}} }\right)}}{{\sum \nolimits _{j \in V} {\exp \left ({{f_{i}^{T}{f_{j}}} }\right)} }}} }\right)} } }}{M} \end{align} View SourceRight-click on figure for MathML and additional features. where $s(i)$ denotes sampling $M$ times as the distribution of the neighborhood similarities of the $ith$ frame. The solution of the fused motion feature $F=[{f_{1}},{f_{2}}, \cdots, {f_{n}}] \in {\mathbb {R}}^{D\times n}$ can be obtained by word embedding model [31], [32], [36]. The whole process of SNBFF is summarized as Fig.1.

FIGURE 1. - Summary of human motion feature capturing with various kinds of sensors.
FIGURE 1.

Summary of human motion feature capturing with various kinds of sensors.

B. Iterative Quantization Hashing (ITQ)

To fully keep the information of movements in human motion time series, the corresponding binary representations of motions should preserve the action similarities. Considering the ability of hashing method in preserving data similarities to hamming space [33], [39]–​[41] and compacting data storage. SNBS adopts a typical hashing method, Iterative Quantization hashing (ITQ), to realize data encoding. ITQ can effectively facilitates representing human motions with short binary codes. This section will detailedly introduce the process of ITQ.

For motion sequence data $X$ , utilizing the PCA method to reduce the dimension of original data like many hashing methods [33], [40]. The affinity matrix $P$ for reducing the dimension of original data can be obtained by Eq.(10).\begin{equation} \mathop {\arg \min }\limits _{P} E\left ({P }\right) = \mathop {\arg \min }\limits _{P} \left \|{ {\left ({{X - \overline X } }\right)P} }\right \|_{F}^{2} \end{equation} View SourceRight-click on figure for MathML and additional features. where $\overline X$ is the centralization matrix of $X$ . By utilizing the affinity matrix, the low-dimensional embedding $Y$ can be obtained as $Y=XP$ . Based on the low-dimensional data $Y$ , a rotation matrix $R$ will be calculated by ITQ to further adjust the low-dimensional embedding [33], which significantly improved the performance of ITQ. The rotation matrix can be calculated by Eq.(11). \begin{equation} \mathop {\arg \min Q(B,R)}\limits _{B,R} = \left \|{ {B - YR} }\right \|_{F}^{2} \end{equation} View SourceRight-click on figure for MathML and additional features. where $B \in {\{ - 1,1\} ^{n \times c}}$ is the binary representations of original data $X$ , and $c$ denotes the length of binary codes. Eq. (11) can be transformed as follows:\begin{align} \mathop {\arg \min Q(B,R)}\limits _{B,R}=&\left \|{ B }\right \|_{F}^{2} + \left \|{ Y }\right \|_{F}^{2} - 2tr(B{R^{T}}{Y^{T}})\notag \\=&nc + \left \|{ Y }\right \|_{F}^{2} - 2tr(B{R^{T}}{Y^{T}}) \end{align} View SourceRight-click on figure for MathML and additional features. EM algorithm is utilized to solve Eq.(12) [33].

1) Fix $R$ , Calculate $B$

Since the determinant of a rotation matrix should be 1, $R$ can be initialized with an orthogonal matrix. Then, Eq.(12) can be transformed as Eq.(13), and the optimal value $B$ is the binary representation of original data.\begin{equation} \mathop {\arg \max }\limits _{B} tr(B{R^{T}}{Y^{T}}) = \mathop {\arg \max }\limits _{B} \sum \limits _{i = 1}^{n} {\sum \limits _{j = 1}^{c} {{B_{ij}}{\tilde Y_{ij}}} } \end{equation} View SourceRight-click on figure for MathML and additional features. where ${\tilde Y_{}} = {R^{T}}{Y^{T}}$ .

2) Fix $B$ , Calculate $R$

When $B$ is calculated, rotation matrix $R$ can be optimized by Eq.(14).\begin{equation} \mathop {\arg \max }\limits _{R} tr(B{R^{T}}{Y^{T}}) = \mathop {\arg \max }\limits _{R} ({S^{T}}{B^{T}}YS) \end{equation} View SourceRight-click on figure for MathML and additional features. where $R = S{S^{T}}$ . Since Eq.(14) is a quadratic form problem, the optimanl $S$ can be calculated by operating SVD decomposition for ${B^{T}}Y$ [48]. Based on $S$ , the rotation matrix can be expressed as $R = S{S^{T}}$ .

With the facilitation of ITQ, original motion sequence can be represented with a series of binary codes, which explicitly represent the original data relations and effectively compact the data storage. In next section, the binary representation of original motion data are utilized to measure motion change degree in the time series. The change process of human motions is described with binary representations.

C. SNBS with Hashing Based Motion Change Measurement

This section mainly introduces the way to construct binary representations of human motion sequence by ITQ, and the way to evaluate the motion change degree by SNBS. Considering the temporal locality of human motion sequence where adjacent motion frames are similar in the time series [17], [18]. Note that the similarities of motions from the same class will be higher than that of motions belong to different classes. Hence, a human motion sequence can be segmented by detecting the similarity change degree of each motion frame in time series.

Hashing learning aims to represent original data with binary codes and preserves the original data similarity to hamming space. With the excellent performance in compacting data storage and preserving data similarities, hashing learning have been widely applied in different kinds of applications [43]–​[45]. Based on hashing methods, original data can be mapped to hashing buckets and similar data will be mapped to close hashing buckets. Therefore, motions of the same class will be mapped to buckets which are close measured by hamming distance. When motions of new classes appear, new buckets will also appear. Hence, we can utilize hashing method to project original data to hamming space to effectively measure the motion changes. With the facilitation of hashing learning, human motion sequence can be segmented into many motion clips, where each clip belongs to a hashing bucket and represents a motion class.

For human motion sequence $X$ , each frame ${x_{i}}$ can be represented with binary codes ${b_{i}} \in {\left \{{ {1,0} }\right \}^{c}}$ with the hashing method introduced in previous section. Therein, $c$ is the length of binary representation. For each frame ${x_{i}}$ , compute the average hamming distance of neighborhood ${X_{i}}$ \begin{equation} D\left ({{x_{i}} }\right) = \frac {1}{k}\sum \limits _{j}^{k} {Dist\left ({{b_{i},{b_{j}}} }\right)},\quad {x_{j}} \in N\left ({{x_{i}} }\right) \end{equation} View SourceRight-click on figure for MathML and additional features. where $k$ is the number of neighborhood for ${x_{i}}$ , $Dist\left ({\cdot }\right)$ represents the distance metric function (Here, hamming distance is employed to measure the similarity of two binary codes), $N\left ({{x_{i}} }\right)$ denotes the neighborhood of ${x_{i}}$ . ITQ preserves the original data structure with binary representations, that is $Dist\left ({{x_{i},{x_{j}}} }\right)$ is proportional to $Dist\left ({{b_{i},{b_{j}}} }\right)$ [33], [40]. The maximum distance of ${b_{i}}$ and ${b_{j}}$ is related to the length of ${b_{i}}$ and ${b_{j}}$ . Since the distance of two binary representations is discrete and the number of total binary codes is limited, similar motion frames will be projected to nearby hashing buckets and the corresponding hamming distance will be small. On the other hand, motions of different classes will be projected to hashing buckets which share large hamming distance. To sum up, the changes of average hamming distances of motions from the same class will be small. On contrast, when motions of a new classes appear, the average hamming distance will change dramatically. Therefore, we can describe the motion change by the change of average hamming distance of each frame.

To explicitly reveal the changes of average hamming distances of human motions and effectively detect the transition points in human motion sequence, the change degree of each frame can be represented by Eq.(16). \begin{equation} {M_{i}} = \frac {{{e^{\left |{ {D\left ({{x_{i}} }\right) - D\left ({{{x_{i - 1}}} }\right)} }\right |}}}}{{{e^{D\left ({{x_{i}} }\right)}}}},\quad i \ge 2 \end{equation} View SourceRight-click on figure for MathML and additional features. where $e\left ({\cdot }\right)$ is the exponential function. With the change degree of motions at each moment, the changes of human motions can be quantitatively measured. Since motions from the same class have large probabilities in sharing same binary representations, the average hamming distances for frames located beyond transition clips will be zero with a high probability. As is shown in Fig. 3, most of the average hamming distances are 0 on CMU motion capture database (each frame is represented with 8 bits as an example). Hence, exponential function is utilized to avoid the value of denominator to be zero in Eq.(16). Based on the change of ${M_{i}}$ , the change points can be effectively detected. Take the change degree of human motion sequence $No.~15\_{}01$ in CMU motion capture database as an example. The change degree is large enough for detecting the segment points in time series (as is shown in Fig. 4).

FIGURE 3. - An example of the average hamming distance distribution on CMU motion capture database (8 bits).
FIGURE 3.

An example of the average hamming distance distribution on CMU motion capture database (8 bits).

FIGURE 4. - An example of the change degree distribution on CMU motion capture database (8 bits).
FIGURE 4.

An example of the change degree distribution on CMU motion capture database (8 bits).

Segment points in human motion sequence are those frames located in the motion transition clips, where motions of new classes appear and movements change greatly. Based on this physical property, the change degree of human motions in time series can be utilized to realize HMSS. In most cases, the motion changes are smooth and steady. However, the change degree will increase dramatically when motions of a new class appear. Hence, we can set a change degree threshold $\theta $ to select the segment points (In our experiment, $\theta $ is set as twice that of the average motion change degree of a sequence). When change degree ${M_{i}}$ is larger than threshold $\theta $ , add the $ith$ frame in the motion sequence into the segment points collection $C = \left [{ {c_{1}, \cdots,{c_{m}}} }\right] \in {\mathbb {N}^{1 \times m}}$ . Note that, $m$ is the number of evaluated cuts in human motion time series.\begin{equation} C = \left \{{ {M_{i}|{M_{i}} \ge \theta } }\right \} \in {\mathbb {N}^{1 \times m}} \end{equation} View SourceRight-click on figure for MathML and additional features. The process of SNBS can be summarized as Algorithm 1.

Algorithm 1 SNBS for HMSS

Input:

Human motion data ${X_{1}},{X_{2}} \cdots {X_{L}}$ captured by different sensor networks

Output:

Segment cuts $C$

1.

Construct Jaccard similarity graph ${G^{i}}$ for data ${X_{i}}$ collected from different sensor networks.

2.

Fuse the Jaccard similarty graphs according to section III-A, and construct the fused graph weight by Eq.(7).

3.

Compute the fused human motion sequence representation ${F}$ according to section III-A.

3.

Compute low dimensional embedding $Y$ of fused human motion sequence $F$ with PCA.

4.

Obtain the binary Representation of $Y$ with ITQ method.

5.

Compute the average hamming distance series by Eq.(15).

6.

Generate the change degree series $M$ of human motion sequence by Eq.(16).

7.

Detect the change degree series.

if ${M_{i}}>\theta $

Add ${M_{i}}$ to $C$

end

8.

Output the segmentations $C$ .

SECTION IV.

Experiment Results

To validate the effectiveness of SNBS in segmenting human motion sequences, we conduct experiments on CMU motion capture database. CMU motion capture database has been adopted by many HMSS methods to evaluate the segmentation performance [17]–​[19], [22], [23]. Moreover, human motion segmentation experiments are also conducted on DUT human motion dataset [34]. Data of the DUT human motion dataset is collected by our human motion capture system, which contains posture sensors, depth sensors and visual sensors. The introduced PCA, LPP, PPCA and TS-WMCS segmentation algorithms are adopted as comparison methods to detailedly evaluate the performance of SNBS. In our experiments, PCA and LPP reduce the dimension of original data to the intrinsic dimension which can preserve 90% information of original data [17], [23].

To evaluate the performances of SNBS and the comparison methods more precisely, three widely used protocols are employed in our experiments [17], [22], [23]. The protocols measure the performances of HMSS methods from different perspectives, which are precision ${Pre}$ , recall ${Rec}$ and F-Measure ${F}$ . Therein, ${Pre} = {N_{right}}/{N_{seg}}$ , ${Rec}={N_{right}}/{N_{total}}$ and $F={Pre}*{Rec}*2/({Pre}+{Rec})$ . Note that, ${N_{right}}$ is the number of right segmentation points calculated by the segmentation methods, ${N_{seg}}$ represents the total number of segmentation points generated by the segmentation methods and ${N_{total}}$ denotes the number of transition clips in a human motion sequence. F-Measure evaluates the performance of segmentation methods from the perspectives of segmentation precision and recall simultaneously, which can provide more comprehensive evaluation [49].

A. Experiments on CMU Motion Capture Database

CMU motion capture database contains common human motions of 144 subjects, which are represented with skeletons of 31 joints. 31 motion sensors are located in the joints to capture the motion process of humans (Details in Fig. 5). However, the CMU Motion Capture Database only utilizes the motion sensors to collect the motion data [23], [50], [51]. In our experiment, we randomly select the motion sequences of trail 1 to trail 5 in subject 15 of CMU motion capture database to evaluate the segmentation performances of HMSS methods. The selected sequences range from 5524 frames to 22948 frames, and are mainly composed of every-day behaviors (e.g. wandering, waving and cleaning windows, etc.).

FIGURE 5. - The sensors on human body of CMU motion capture database. (a) Sensor Locations from front view. (b) Sensor Locations from back view.
FIGURE 5.

The sensors on human body of CMU motion capture database. (a) Sensor Locations from front view. (b) Sensor Locations from back view.

SNBS adopts the motion change measurement to detect the transition clips in human motion time series, which provides a precise description of the human motion change process. From Table 1, we can find that SNBS achieves better performances both in precision and recall compared to the data driven HMSS methods (PCA, LPP and PPCA approaches. From Fig. 6, we can find that SNBS provides more precise segment points compared to the PCA, LPP, PPCA and TS-WMCS HMSS methods.

TABLE 1 Comparisons of Precisions and Recalls of 4 Segmentation Methods on CMU Motion Database
Table 1- 
Comparisons of Precisions and Recalls of 4 Segmentation Methods on CMU Motion Database
FIGURE 6. - An example of segmentation results on CMU database.
FIGURE 6.

An example of segmentation results on CMU database.

In the process of detecting segment points, SNBS takes the current motion change and history motion changes into consideration simultaneously. This mechanism can effectively avoid the invalid segmentations in the same motion class. As is shown in Table 1, the segmentation recall of TS-WMCS is the same as SNBS. However, the segmentation accuracy of TS-WMCS is about 40% lower than SNBS. The main reason is that, TS-WMCS generates too many invalid segment points. Note that, TS-WMCS generates nearly 2 times segment cuts than that of SNBS. If evaluating the performances of these segmentation methods from the perspectives of precision and recall simultaneously, we can find that SNBS outperforms the other three segmentation methods obviously (over 35%).

B. Experiments on DUT Human Motion Dataset

Compared to the CMU motion capture dataset, the DUT human motion dataset is composed of more complex human motions [34]. The motion data is captured by our multi-sensors based human motion capture system. The structure of the motion capture system is shown in Fig. 7. The motion capture system utilizes three kinds of sensors to collect motion data, which are posture sensors, RGB camera and depth sensor. In the motion capture system, kinect is utilized to provide the RGB camera and depth sensor. The multi-sensors based human motion capture system can capture human posture information, motion scene, and motion depth information simultaneously. After capturing the motion data through different sensors, the SNBFF method is utilized to fuse the multi-source data and to generate a uniform representation for the human motion sequence.

FIGURE 7. - The structure of motion capture system for selecting human motion data for DUT human motion dataset.t
FIGURE 7.

The structure of motion capture system for selecting human motion data for DUT human motion dataset.t

The DUT human motion dataset not only contains single human motion sequences but also contains human-human interaction motion seqences. In this experiment, we simply select 2 human-human interaction motion sequences, and 2 single human motion sequences. The 2 human-human interaction motion sequences contain different kinds of interactions, including greeting and martial art. On the other hand, the 2 single human motion sequences contain human exercise motions and human motions under different moods. An example of DUT human motion dataset is shown in Fig. 8.

FIGURE 8. - Samples of the DUT human motion dataset. (a) Greeting. (b) Wrestling. (c) Daily exercise. (d) Motions under different moods.
FIGURE 8.

Samples of the DUT human motion dataset. (a) Greeting. (b) Wrestling. (c) Daily exercise. (d) Motions under different moods.

The DUT human motion dataset contains more complex and shorter human motions compared to the CMU motion capture database, which requires the segmentation methods to be more sensitive to the motion changes. The PCA, LPP and PPCA approaches only focus on the data changes in the time series [17], [18], and cannot achieve satisfying segmentation results when the human motions are complex. As is shown in Table 2, PCA, LPP and PPCA approaches cannot accurately detect the motion changes in the experiments. On the other hand, two physical driven segmentation methods, SNBS and TS-WMCS approaches, can effectively capture the motion changes in the time series. The main reason is that physical driven segmentation methods are more sensitive to the changes in human motion sequences.

TABLE 2 Comparisons of Precisions and Recalls of 4 Segmentation Methods on DUT Human Motion Dataset
Table 2- 
Comparisons of Precisions and Recalls of 4 Segmentation Methods on DUT Human Motion Dataset

SNBS employs motion change degree to evaluate the changes of human motion sequences, which not only focuses on the real-time motion change but on the history change process of human motions in time series. Hence, SNBS can effectively detect the motion changes in the time series. According to Table 2, SNBS achieves better performance in detecting the segment points compared to the other HMSS methods both in segmentation accuracy and recall. TS-WMCS adopts the curvature-like descriptor to describe the changes of human motion sequence, which only focuses on the real-time changes in a human motion sequence. Therefore, TS-WMCS can not accurately detect the transition clips in human motion sequences, especially when the human motions of a class are complex and change greatly. As is shown in Table 2, the segmentation accuracy of TS-WMCS is poorer than SNBS. The main reason is that SNBS can evaluate human motion precess more comprehensively by utilizing the hashing based data similarity to evaluate the change degree of human motions.

SECTION V.

Conclusion

This paper proposes a new sensor network oriented method for HMSS named sensor network based segmentation (SNBS). SNBS analyzes human motion sequences from the perspective of motion similarities, and employs hashing method to construct motion change degree measurement to reveal the motion changes in time series. Based on the constructed motion change measurement, the change process of a human motion sequence is described with change degree time series. The change degree considers the real-time change of human motions and the history change process in the human motion sequences simultaneously. To precisely evaluate the change degree of human motions, a motion change evaluation criterion is constructed in this paper. The constructed evaluation criterion can effectively reveal the change points in the human motion sequences and avoid meaningless cuts for HMSS tasks. Experimental results validate the effectiveness of the proposed method in detecting the change points compared to several state-of-the-art methods.

Our future work will mainly focus on the following issues.

  • Firstly, how to employ supervised information to improve the segmentation accuracy of HMSS.

  • Secondly, how to extend the HMSS methods to real-time analysis of time-series.

Cites in Papers - |

Cites in Papers - IEEE (4)

Select All
1.
Sirisha Satish, Sanjay M Belgaonkar, Aravind M, Neha Santhosh Thekkethala, "Secure Automated Smart Home", 2024 11th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), pp.615-620, 2024.
2.
Xinghua Liu, Yunan Zhao, Jianwei Guan, Hui Cao, "Event Camera-based Motion Segmentation via Depth Estimation and 3D Motion Compensation", 2022 41st Chinese Control Conference (CCC), pp.6742-6747, 2022.
3.
Wei Liu, Hongtu Di, Yang Zhang, Yongkang Lu, Xikang Cheng, Jiacheng Cui, Zhenyuan Jia, "Automatic Detection and Segmentation of Laser Stripes for Industrial Measurement", IEEE Transactions on Instrumentation and Measurement, vol.69, no.7, pp.4507-4515, 2020.
4.
Dongsheng Zhou, Xinzhu Feng, Pengfei Yi, Xin Yang, Qiang Zhang, Xiaopeng Wei, Deyun Yang, "3D Human Motion Synthesis Based on Convolutional Neural Network", IEEE Access, vol.7, pp.66325-66335, 2019.

Cites in Papers - Other Publishers (4)

1.
Xinghua Liu, Yunan Zhao, Lei Yang, Shuzhi Sam Ge, "A Spatial-Motion-Segmentation Algorithm by Fusing EDPA and Motion Compensation", Sensors, vol.22, no.18, pp.6732, 2022.
2.
Jiacheng Cui, Wei Liu, Yang Zhang, Changyong Gao, Zhe Lu, Ming Li, Fuji Wang, "A novel method for predicting delamination of carbon fiber reinforced plastic (CFRP) based on multi-sensor data", Mechanical Systems and Signal Processing, vol.157, pp.107708, 2021.
3.
Tingting Zhang, Zhen Wen, Yina Liu, Zhiyuan Zhang, Yongling Xie, Xuhui Sun, "Hybridized Nanogenerators for Multifunctional Self-Powered Sensing: Principles, Prototypes, and Perspectives", iScience, vol.23, no.12, pp.101813, 2020.
4.
Christopher Reining, Michelle Schlangen, Leon Hissmann, Michael ten Hompel, Fernando Moya, Gernot A. Fink, "Attribute Representation for Human Activity Recognition of Manual Order Picking Activities", Proceedings of the 5th international Workshop on Sensor-based Activity Recognition and Interaction, pp.1, 2018.

References

References is not available for this document.