Introduction
Due to the ubiquity and penetration of Wi-Fi in our homes, workplaces and cities, Wi-Fi traffic can be repurposed as a sensing modality for many potential applications beyond the original intended data-carrier functionality. Indeed, recent compelling research has reimagined a commodity Wi-Fi device as a multi-purpose sensor capable of turning Wi-Fi traffic—that is, packets transmitted over a wireless communication channel for either data transfer and/or the judicious probing of the channel—into a rich source of computational information explaining space dynamics, assessing the social environment and even tracking people’s posture, and gestures [1]–[5].
However, human-perturbed Wi-Fi channels remain ill-understood. Despite prior art showcasing compelling use cases, ad hoc inference pipeline and careful parameter tuning are commonplace for arriving at sensing recipes that yield good performance. Essentially, conventional approaches seek to associate patterns in Wi-Fi channel state information (CSI) with human activity through training classifiers on top of often bespoke featurization e.g. statistical distributions in [3] and Doppler variations in [5]. Although these sensing approaches demonstrated the potential of CSI sensing in a brand new class of applications, often, they are sensitive to environmental conditions and thus require controlled setup and development of pre-processing and inference pipelines which do not generalize across tasks (i.e. applications) and deployment environments. As such, CSI as a general-purpose sensing modality has not been adopted widely.
We argue for unleashing the true potential of CSI as a general-purpose human sensing modality; we need to turn our attention to developing sound theories explaining the relationship between spatiotemporal wireless channel modulations and human movement. Such characterization will assist in designing the future Wi-Fi network with stack layers augmented with annotations derived from the wireless propagation medium. These annotations would describe the physicality induced by the dynamic human movement which accompanied the delivery of data, thereby providing added context.
To this end, in this paper we present a first formalization for quantification of the changing part of the wireless signal modulated by human motion. Based on established channel models we devise new channel statistics that succinctly characterize the signal modulated by dynamic human movement. We then demonstrate that these channel statistics carry enough information to describe spatiotemporal human movement when observed continuously. This leads us to develop a novel subspace tracking algorithm that continuously analyses signal subspace as a function of dynamic human movement. The application of such metric enables us to precisely describe a set of human movement primitives including presence, motion activities, etc. As a step towards realizing CSI as a general-purpose sensing modality, we showcase how features extracted from the evolution of these subspaces can robustly reproduce state-of-the-art application-specific feature engineering baseline, however, across multiple usage scenarios and environmental conditions. Our research contributions are three-fold:
Statistical analysis of the signals to formally devise new statistics characterizing human-perturbed Wi-Fi channels.
Formalization of CSI sensing as a subspace tracking problem, demonstrating that the analysis of the dynamics of a signal subspace is the equivalent of sensing human movements.
Quantification of the benefits of using features derived from the proposed statistics and corresponding tracking technique concerning bleeding-edge CSI sensing applications.
We start by reviewing the required mathematical background of channel modeling in Section II-C. How the channel model can be used for sensing is explained in Section II-D. We use subspace based statistics to analyze human modulation of wireless channels in Section III. We show that the analysis of the dynamics of a signal subspace is equivalent to sensing human movements. We show by way of example how features extracted from subspace evolution can be used to solve sensing tasks in Section IV. We evaluate our subspace tracking featurization for two applications in Section V, provide a discussion in Section VI, and conclude with Section VII.
Measurement Model
A. Notation
Vectors and matrices are denoted in bold lowercase
B. Problem Statement
Our goal is to take steps towards a systematic study of the human-modulated subspace of CSI measurements. To this end, suppose we have a collection of CSI measurements \begin{equation*} \mathcal {H} = \mathcal {S} + \mathcal {N},\tag{1}\end{equation*}
Sufficiency of covariance statistics.
It is sufficient to consider the covariance statistics of
Dominance of the signal subspace.
The human modulation is characterised by magnitudes of variation of the covariance statistics at the appropriate time scales.
Considering the measurement axes independently leads to an interpretable and effective dimensionality reduction on
We introduce the structured channel model and the observation model used in the rest of the paper in Sections II-C and II-D, respectively. The channel model provides a mathematical description of the measurement data. The observation model will be used to derive pre-processing techniques that have a sound physical justification for sensing tasks in Sections III and IV.
C. Wideband MIMO Channel Model
The Structured channel model we use belongs to a class of correlative wideband MIMO channel models [6]. Our starting point is the eigendecompositions of the channel model. This approach was first developed by Weichselberger [7], although it had also been developed independently by other works e.g., [8].
In the general case, we assume that CSI data forms a four dimensional dataset, with the four axes being the choice of receive antenna, transmit antenna, delayspread tap during one transmission step in time. We denote these measurement dimensions by the subscripts Rx, Tx, and Dy, respectively. We arrange CSI measurements into a tensor
We treat the measurement of the tensor \begin{equation*} \mathcal {H}_{(m)}(t+\tau) = \mathbf {W} \mathcal {H}_{(m)}(t) + \bar { \mathbf {W}} \boldsymbol {\Xi }\tag{2}\end{equation*}
We now suppress the time step \begin{align*} \mathbf {R}_{\text {Rx}}=&\boldsymbol {\mathsf {E}}\!\left \{{ \mathcal {H}_{(1) } \mathcal {H}^{H}_{(1) } }\right \}\!, \\ \mathbf {R}_{\text {Tx}}=&\boldsymbol {\mathsf {E}}\!\left \{{ \mathcal {H}_{(2) } \mathcal {H}^{H}_{(2) } }\right \}\!, \\ \mathbf {R}_{\text {Dy}}=&\boldsymbol {\mathsf {E}}\!\left \{{ \mathcal {H}_{(3) } \mathcal {H}^{H}_{(3) } }\right \}\!,\tag{3}\end{align*}
Eigendecomposition is then applied to Equation (3) in order to extract the channel eigenbases in space (receive and transmit dimensions) and in delay spread according to \begin{align*} \mathbf {R}_{\text {Rx}} &= \mathbf {U}_{\text {Rx}} \boldsymbol {\Delta }_{\text {Rx}} \mathbf {U}^{H}_{\text {Rx}}, \\ \mathbf {R}_{\text {Tx}} &= \mathbf {U}_{\text {Tx}} \boldsymbol {\Delta }_{\text {Tx}} \mathbf {U}^{H}_{\text {Tx}}, \\ \mathbf {R}_{\text {Dy}} &= \mathbf {U}_{\text {Dy}} \boldsymbol {\Delta }_{\text {Dy}} \mathbf {U}^{H}_{\text {Dy}}. \tag{4}\end{align*}
D. Observation Model
We observe a sequence of channel tensors \begin{equation*} \mathcal {H}[k] = \mathcal {S}[k] + \mathcal {N}[k] \tag{5}\end{equation*}
We write the observation model in terms of unfolding matrices as \begin{equation*} \mathcal {H}_{(i)}[k] = \mathcal {S}_{(i)}[k] + \mathcal {N}_{(i)}[k]\tag{6}\end{equation*}
In what follows, we look at the third unfolding, which in our setup corresponds to the Dy dimension. Similar treatment applies to the Rx and Tx dimensions. Since the human-induced modulation and noise are uncorrelated, we can rewrite the one-sided correlations of equation (3) as \begin{align*} \mathbf {R}_{\text {Dy}}[k]:=&\boldsymbol {\mathsf {E}}\!\left \{{ \mathcal {H}_{(3) }[k] \mathcal {H}^{H}_{(3) }[k] }\right \} \tag{7}\\=&\mathbf {C}_{\text {Dy}}[k] + \sigma ^{2}_{\text {Dy}}[k] \mathbf {I}_{M_{h}} \tag{8}\end{align*}
Dropping \begin{align*} \mathbf {R}_{\text {Dy}}=&\mathbf {U}_{\text {Dy}} \boldsymbol {\Delta }_{\text {Dy}} \mathbf {U}^{H}_{\text {Dy}} \\ \mathbf {R}_{\text {Dy}}=&\begin{bmatrix} \mathbf {U}^{s}_{\text {Dy}} ~\mathbf {U}^{n}_{\text {Dy}} \end{bmatrix} \begin{bmatrix} \hat {\boldsymbol {\Delta }}^{s}_{\text {Dy}}&\quad \mathbf {0} \\ \mathbf {0}&\quad \boldsymbol {\Delta }^{n}_{\text {Dy}} \end{bmatrix} \begin{bmatrix} { \mathbf {U}^{s}_{\text {Dy}}}^{H} \\ { \mathbf {U}^{n}_{\text {Dy}}}^{H} \end{bmatrix} \tag{9}\end{align*}
Each of the eigendecompositions in Equation (4) define a natural filtration, that is, a succession of growing subspaces
Subspace Characterisation
In this section we hope to justify the claim that the projected signal subspaces introduced in the previous section are useful statistics which preserve human channel-modulating effects, while simultaneously being minimally diluted by noise. This claim is clearly non-trivial: human movements in the signal locale exert unconventional effects on the wireless channel which have not seen similar formal treatment in literature compared to more established channel models adopted widely by industry, say typical urban cellular fading channels [11]. The closest kin to human-modulated Wi-Fi channels in prior literature are perhaps body area network (BAN) channel models; consult [12]–[14] and literature therein for further detail. Specific characteristics of the wireless standard 802.11g/n/ac such as bandwidth, carrier frequencies, and air interface, impart modulating effects well beyond those studied for BANs.
A. CSI Sensing Model
As illustrated in figure 1, the Wi-Fi-based sensing model consists of placing a pair of transmitter and receiver devices in the environment. There are many paths by which electromagnetic energy travels between the transmitter and receiver. When people move, they disturb the multipath profile in the environment. The multipath profile is the linear superposition of a number of paths. For instance, figure 2 shows two static paths: a direct line-of-sight (LOS) and a reflected non-line-of-sight (NLOS) paths. When a human subject walks from left to right in the figure, a dynamic path is modulated by this movement. By analyzing the temporal pattern of these dynamic paths at the receiver, we are able to build sensing applications.
Good wireless SNR does not necessarily translate into good sensing sensitivity. for sensing, there is more to designating signal and noise subspaces than meets the eye. (a) Strong reflection. (b) Weak reflection.
For each transmitter-receiver pair, the superposition of multipaths in the time domain is described by a
As such, the transmitted signal
We ask some basic questions:
How can we characterize the human modulated subspace of the channel?
How do the dimensionality and direction of the subspace vary in time as a result of human movement?
We discuss the theoretical underpinnings of our approach in Section III-B, particularly with a view towards contrasting to seminal prior work in Wi-Fi sensing. We then study data on uncontrolled human movement in Section III-D
B. Background on Subspace Tracking for Wireless Signals
In classic signal processing, estimating the relevant subspace of variation in data is a basic building block of a data processing pipeline [15], [16]. In the context of an indoor wireless channel, the human modulated portion of the correlation data (cf. Equation (3)) is unknown with complex temporal dynamics.
Wang et al. [5] obtain good sensing results using an ad hoc pipeline starting with the full wideband covariance matrix (see [6]). We believe that this choice necessitates the use of excessive time-averaging of the CSI data. Furthermore, the resulting signal subspace is not easy to interpret. In contrast, the Rx, Tx and Dy correlations defined in Equation (3) are interpretable low dimensional representations. Despite the pioneering sensing approach, two drawbacks come to mind:
the spatial and temporal behavior of the channel are not easily exposed, and
the temporally highly averaged subspaces are less reactive to human activities.
The good sensing results aside, the approach of [5] does not conform to wireless theory, according to which human modulation should be quantifiable using subspace tracking. Correlative MIMO subspace-based channel models have been shown to estimate capacity [6]–[8], and therefore the physicality of the medium. Intuitively, a model able to conform with a universal information-theoretic measure such as capacity is bound to convey fundamental information about the state of the channel irrespective of what modulates the channel. Further, recent theoretical results suggest that the rate of change of a MIMO OFDM channel can be inferred from the statistical analysis of its first and last eigenvectors [9], which can be viewed as canonical representatives of the signal and noise subspaces, respectively.
To elaborate on the dynamic nature of the signal subspace, consider a multipath component whose phase adds destructively to a main cluster of multipath, as depicted in Figure 2a. If the single multipath were to be shadowed as a result of a transient movement as in Figure 2b, it is clear that SNR would increase momentarily commensurate with the gain in total multipath arrivals energy. However, the sensing scene could have further nuances that are not captured by this simple SNR enhancement. As a further thought experiment, let the single multipath component be probing of a spatial sector in the environment in which a physical activity is unfolding—denoted by a spiral in Figure 2. That is, the single multipath component disproportionately delivers added movement sensitivity over that delivered by the main cluster of multipath. Despite the transient shadowing effect resulting in a boost in SNR, the instantaneous combined channel response is rendered less sensitive to activities occurring in the aforementioned spatial sector. The reduced motion modulation is manifested in reduced correlation structure in the regions of covariance matrix. Consequently—and perhaps counter-intuitively given the SNR gain—the signal subspace would necessarily “shrink” and noise subspace would “expand” momentarily. Therefore, robust sensing requires that the signal and noise subspaces be tracked explicitly in order to account for nuanced instantaneous channel effects.
The above contrived discussion suggests that a sensing system is required to adapt to dynamic channel effects in order to sustain optimal performance. Until provision for such adaptation is made in CSI-based sensing systems, we argue that models will fall short at being generalizable with guaranteed performance bounds irrespective of the nuances encountered in real-world deployment environments.
Stationarity Period
The evolution of the signal subspace can be monitored at different granularities depending on the end-user application. An example of this scenario may be seen in activity recognition applications. Activity recognition requires deriving channel signatures of sufficient discriminatory power as to allow for the unambiguous separation of activities potentially similar in their broad nature e.g. walking versus running. The stationarity period is affected by, besides the application, the sensor configurations such as sampling rate.
For example, while 25ms may be necessary for responsive activity recognition applications, a 100ms or more may suffice for the much coarser presence detection. Note that sensing models may also be possible to realize even with “aliased” channel statistics akin to compressive sensing. However, we will not discuss this further here.
C. Sensing Complexity
The trade-off between sensing sensitivity and generality is a key question when it comes to designing any data processing pipeline. Generality implies flexibility for applying techniques from one sensing application to another. Sensitivity refers to optimality for a fixed sensing task. These are affected mainly by
sensing pipeline configuration alongside its parameters, and
the dimensionality of the signal subspace of the data, as it travels through the pipeline.
We next shed light on the complexity of the human-modulated Wi-Fi signal subspace by way of an empirical study. The aim is to establish that there is more to designating signal and noise subspaces than meets the eye. Future research ought to take this complexity into consideration if Wi-Fi sensing were to be transitioned from controlled setups and into the wild.
D. Empirical Study of Uncontrolled Human Movement
We proceed to study the statistical effects of human activities on the channel covariances. Specifically, we study the projected signal subspaces, our putative proxies for the signal subspace for human modulation. To this end, we first quantify the information about the physical environment contained in the covariance matrix. This information is dynamic in nature and needs to be quantified instantaneously. One approach to gauging the information content in a series of covariances is to monitor the distortion contributed by the constituent eigenvectors. That is, by successively nulling the respective eigenvectors and measuring the fidelity of the covariance matrix reconstruction, we can quantify in the mean squared error-sense (MSE) the signal and noise boundaries at a given target distortion level (e.g. −12dB).
Concretely, let \begin{equation*} \boldsymbol {\Delta }^{\prime }_{i} = \begin{bmatrix} \ddots & \quad 0 & \quad 0 & \quad \ldots & \quad \ldots & \quad 0 \\ 0 & \quad \delta _{i-1} & \quad 0 & \quad 0 & \quad \ldots & \quad 0 \\ 0 & \quad 0 & \quad \delta _{i} & \quad \ddots & \quad & \quad \vdots \\ \vdots & \quad & \quad \ddots & \quad 0 & \quad \ddots & \quad \vdots \\ \vdots & \quad & \quad & \quad \ddots & \quad \ddots & \quad \vdots \\ 0 & \quad 0 & \quad \ldots & \quad \ldots & \quad \ldots & \quad 0 \end{bmatrix}\tag{10}\end{equation*}
\begin{equation*} \mathrm {MSE} = \frac {1}{N^{2}} {\sum _{k,l} (r^{\epsilon }_{kl})^{2}}\tag{11}\end{equation*}
The above MSE search allows us to build a time-series picture of the dynamic partitioning of the covariance into signal and noise subspaces. This evolution of signal and noise subspaces is indicative of the evolution in the corresponding propagation conditions and also necessarily human movement. Intuitively, the harsher the dynamics of wireless propagation conditions, the more fluctuating the boundary between signal and noise subspaces is.
Having arrived at a statistical picture of subspaces boundary, we can utilize this knowledge to examine how the fractional signal subspace energy changes throughout human movement. We define the fractional signal subspace energy as the ratio between energy in the signal subspace to total energy contained in the channel. Thus, the fractional energy can be written as
The following discourse considers uncontrolled indoor human movement. This is perhaps the most generic form of activities likely to occur indoors. Naturally, uncoordinated motion components superimpose to modulate the signal subspace in random ways. Stronger motion components could also mask much weaker ones.
We begin by examining what effect increased human movements has on the signal and noise subspaces. We conduct an experiment in which participants were asked to walk randomly in a room. The number of moving people present was varied from 0 (i.e. empty) to 8. The duration of movement per session was 5 to 10 minutes. An 802.11n
We investigate the effect of increased human movement on signal and noise subspaces by way of searching for the subspaces boundary yielding an objective MSE distortion as outlined earlier. Eigenvectors contributing less to the fidelity of covariance reconstruction will fall within the noise subspace. Conversely, eigenvectors impacting the fidelity of reconstruction more pronouncedly belongs to the signal subspace. The MSE-guided search finds the subspaces boundary that satisfies a desired distortion level in the MSE sense. Owing to the finite subspace resolution of a practical system, we interpolate between two MSE distortion levels produced at adjacent eigenvectors in order to simulate the effect of a smoothly varying MSE distortion and its respective “fractional” subspace index.
Figure 3a illustrates the variability in the fractional subspace energy
Characterizing signal and noise subspaces through an MSE search procedure for 5 uncontrolled human movement scenarios. The numbers in legend 0, 2, 4, 6, and 8 denote how many moving people are present. (a) Fractional energy
We conclude this section by qualifying our MSE-search methodology using mutual information (MI). The instantaneous subspaces boundary is used to agglomerate series of reconstructions of the covariance matrix as to compare against the groundtruth covariance distribution. We sweep the objective MSE distortion between −24 dB and −3 dB in 3 dB increments. We then measure the normalized mutual information between
Normalized mutual information between covariances and their imperfect reconstructions over time and across 5 uncontrolled human presence scenarios, highlighting that the signal subspace is dynamic in nature. Legend denotes how many moving people are present. (a) Instantaneous subspace boundary. (b) Normalized MI versus MSE reconstruction objective. (c) Normalized MI versus subspace extent.
Subspace Tracking
In Section III, we established and characterised the notions of signal and noise subspaces within the context of human-induced channel perturbations. We now turn to examples of how to derive features for sensing tasks. Our approach is to track the evolution of the projected signal subspaces (cf. Section II-D).
The subspace-based human sensing we advocate for is in line with foundational work in wireless channels [6]–[9], which is in contrast to prior work on wireless Wi-Fi sensing (see [5]). We show that with a good enough instantaneous estimate of the covariances described in Section II this tracking can be used to capture the effects of human modulation. We present our analysis of the Dy-projected signal subspace, but the same can be easily repeated for the Rx dimension.
A. A Geometric View of Subspace Evolution
As an example of subspace tracking, we present the trajectories of the eigenvectors of the covariance matrices (cf. Equation (3)).
Consider the time evolution of subspaces spanned by the first two (unnormalized) eigenvectors
Geometric interpretation of subspace evolution. (a) Subspace component 0. (b) Subspace component 1.
That is, recalling equation (9), each of these subspace components at the
Therefore, referring to figure 5 again, a critical insight emerges: human effects on the wireless channel can be “demodulated” by observing the corresponding angular movements of the signal subspace.
B. Differential Subspace Evolution
The time dependency of the angular movements of the subspace is visualized in figure 5. The (complex) angles, which can be computed as the real part of Hermitian inner product,
These angles signify the differential movement of a certain signal subspace component between the
In general, our proposed differential unitarity feature for tracking human-modulated signal subspaces is applicable to any channel eigendecomposition formulation commonly encountered in literature. Denote by \begin{equation*} \hat {u}_{\text {Dy},i}[k] = \mathbf {u}_{\text {Dy},i}^{H}[k] ~\mathbf {u}_{\text {Dy},i}[k-1] \tag{12}\end{equation*}
Similarly, for the receive-side eigenbasis \begin{equation*} \hat {u}_{\text {Rx},i}[k] = \mathbf {u}_{\text {Rx},i}^{H}[k] ~\mathbf {u}_{\text {Rx},i}[k-1] \tag{13}\end{equation*}
Equations (12) & (13) represent two degrees of freedom through which we can measure the volatility in the wireless channel as a result of human stressors: (1) spatial from multiple antennae and (2) temporal across the delayspread (or equivalently bandwidth). We next build intuition for these complementary differential unitarity metrics by presenting a series of concrete numerical examples.
We return to the uncontrolled movement dataset reported in Section III. Further, let us examine the behavior of the differential unitarity for the 1st subspace component of both the receive-side and delayspread subspaces i.e.
Differential unitarity cumulative distribution functions across 9 occupancy scenarios for Rx and Dy. legend denotes how many moving people are present. (a) and (c) correspond to the successive pairwise correlations, while (b) and (d) measure the rate of range. The rate of change is referred to by the
The diagram in figure 7b shows that the subspace bases relating to
Interrogating rate of change of differential unitarity by correlating farther apart eigenvectors with successively decreasing distance. (a) Pairwise. (b) Slope.
Deriving a measure of the rate of change of differential unitarity has the advantage of increasing the separation of the CDFs of figures 6a & 6c. To this end, we apply the scheme depicted in the lower diagram of figure 7 (termed “slope”) to the same experiment and obtain the CDFs shown in figures 6b & 6d. It is readily evident that the CDFs corresponding to the rate of change in the differential unitarity extracted over a window of time experience increased dispersion as a result of human occupancy. This may allow for learning and/or calibrating better discrimination boundaries in the inference logic.
1) Subspace Sampling
Recalling equation (8), we note that the expectation operator implies an averaging effect. Earlier we have elaborated on the notion of stationarity period and its connection to CSI sampling and application granularity requirements. Yet another pertinent aspect for consideration lies in how to realize the expectation. Broadly, there are two methods often employed in classic signal processing literature for updating the signal subspace: (i) stochastic approximation, and (ii) batch averaging. These two variants have implications on signal subspace tracking, which we discuss next.
An unbiased stochastic expectation estimator is given by \begin{equation*} \mathbf {R}_{\text {x}}[k] = (1-\lambda) \sum _{n=0}^{k} \lambda ^{k-n} \mathcal {H}_{(m)}[n] \mathcal {H}^{H}_{(m)}[n] \tag{14}\end{equation*}
\begin{equation*} \mathbf {R}_{\text {x}}[k] = \lambda \mathbf {R}_{\text {x}}[k-1] + (1-\lambda) \mathcal {H}_{(m)}[k] \mathcal {H}^{H}_{(m)}[k] \tag{15}\end{equation*}
\begin{equation*} \mathbf {R}_{\text {x}}[k] = \frac {1}{L} \sum _{n=k-L+1}^{k} \mathcal {H}_{(m)}[n] \mathcal {H}^{H}_{(m)}[n] \tag{16}\end{equation*}
We now compare and contrast between these two subspace update variants. An activity recognition dataset available publicly is used [1]. The dataset is comprised of 6 single-user activities; namely, standing up, sitting down, lying down, falling, walking, and running. SIMO CSI data from three receivers is sampled at 1 ksps rate. For added tracking responsiveness and resolution, we choose a stationarity period of 25ms and proceed to update the covariance matrix with 95% CSI overlap from previous stationarity period. This results in around 800 Hz subspace update rate.
In Figure 8, we perform time-frequency localization on the pairwise differential unitarity subspace tracking metric. The localization uses a window of 1.28 seconds with 95% content overlap between two windows for finer time-frequency resolution. In the interest of space, only four single-user activities are shown corresponding to falling, lying down, walking, and running. The spectrograms of the upper row of Figure 8 were generated using the batch subspace update variant; while those in the lower row utilized the stochastic variant with a forgetting factor
Spectrograms for 4 activities (falling, lying down, walking, and running) using two subspace update variants: batch and stochastic. It is readily that stochastic has a filtering effect on the time-frequency localization of pairwise differential unitarity subspace tracking metric. (a) Falling – batch. (b) Lying down – batch. (c) Walking – batch. (d) Running – batch. (e) Falling – stochastic. (f) Lying down – stochastic. (g) Walking – stochastic. (h) Running – stochastic.
We have opted to conduct time-frequency localization on the pairwise subspace tracker owing to its more intuitive association with speed i.e. 1st-order derivative of subspace evolution. A justification for the correspondence between the rate of change in CSI and speed can be found in [5]. Our 1st-order differentiation of the subspace can be viewed as a generalized fusion method for extracting information embedded in all subcarriers simultaneously. This fusion is a data-level fusion, rather than feature-level approaches involving ad hoc subcarrier selection strategies [4]. Some reported Wi-Fi sensing systems resort to selecting subcarriers of better SNR since frequency selectivity of wideband Wi-Fi channels causes some subcarriers to fall within the channel nulls—with obvious consequences for their reliability. Our subspace approach systematically fuses information contained in all subcarriers without the need to perform preconditioning. However, unlike the PCA-based approach [5], this fusion is principled, interpretable, and has its roots in formal wireless channel concepts [6]–[9].
As illustrated in Figure 7b, we use a robust sampling technique to obtain clean statistics from the differential unitarity measurements. In Figure 9, we illustrate the effect of this choice. Six single-user activities—standing up, sitting down, lying down, falling, walking, and running—for the slope tracker are depicted. Quick inspection of these plots corroborate the earlier findings of the spectrograms analysis; namely, that stochastic filters high-frequency channel perturbations compared to batch. That is, stochastic tracks the envelope of the activity rather than its and/or the channel’s background high-frequency fluctuations. We have alluded to this tunable channel detail in the signal subspace, be it channel background- or activity-related, by the hat accent in Equation (9). The abrupt activity of falling has an impulse-like acceleration content, while running is the richest in such 2nd-order rate of change moments.
Waveforms for 6 activities (standing up, sitting down, lying down, falling, walking, and running) using two subspace update variants: batch and stochastic. it is readily noticeable that stochastic has a filtering effect on the slope differential unitarity subspace tracking metric. (a) Lying down. (b) Sitting down. (c) Standing up. (d) Walking.
2) Duality
For completeness, we provide commentary on the pertinent issue of choosing a channel representation: time- versus frequency-domain. The structured model we introduced in Section II-C has been validated with empirical channel impulse response (CIR) measurements i.e. in the time-domain. Identical eigenspace formulation has been applied in the frequency-domain for CSI instead [18], and also validated with empirical capacity measurements. Since our subspace trackers are differential in nature, tracking is insensitive to the representation of the channel be it time- or frequency-domain. That said, a salient point in relation to the phase behavior of the trackers is worth making for completeness of treatment. The numerical perturbations experienced in the time-domain—as a function of human motion—differ to those experienced in the frequency-domain. Classic work on the stability of subspaces provides bounds on their trigonometric (i.e. angular) behavior as a function of technical mathematical issues ranging from eigenvalue spectral gap to numerical residuals [19].
To highlight this point, we revisit the waveform of the slope subspace tracker for the running activity depicted in Figure 9f. We perform channel decomposition through to differential unitarity calculations both for the CIR and the CSI versions of measurements (i.e. time & frequency domains). The results are shown in Figure 10. As illustrated in Figures 10a & 10b, it is intuitive to note that the differential tracker performs identically in time and frequency domains. After all, a linear operator (i.e. [I]DFT) translates between one domain to another. The occasional polarity switch in the phase of the differential tracker (Figures 10c & 10d) can be explained by the effects studied in [19]. However, it is interesting to note the increased phase instabilities when running the differential metric on top of CIR measurements over those obtained from CSI measurements. This phenomenon can be readily seen in Figures 10c & 10d. We conjecture that the sparsity in the CIR measurements (i.e. impulse-like nature) compared to the smoother CSI measurements causes numerical instabilities which give rise to added phase instabilities in the subspace. The scatter plot of Figure 10d supports this hypothesis as can be seen by the tighter clustering in the CSI case. However, further investigations are needed to fully illuminate this issue before solid conclusions can be drawn.
Waveforms for slope tracker corresponding to the running activity. differential metric is identical in magnitude, but subspace phase stabilities exhibit interesting variations that are different depending on whether subspace decomposition is performed in the time-domain or frequency-domain. (a) Magnitude. (b) Error power. (c) Scatter.
Evaluation
In what follows, we showcase how specialized occupancy and activity sensing can be built atop our featurization.
A. Occupancy Detection
1) Experimental Setup
We evaluate the performance of subspace tracking in terms of the robustness of occupancy detection. To evaluate the robustness, we investigate the accuracy of the classification model in new environments. More specifically, we trained the classification model using CSI data obtained from a certain placement and tested its accuracy on different placements.
a: Data
We collected the CSI data in 8 places and on 41 placements in total. As depicted in figure 11, the places include six rooms, one lobby, and one lounge and have different characteristics such as room layout and furniture position. We collected the CSI data while varying the number of moving people from zero to 2 (P4, P5, P6) and to 3 (the rest). Each session lasted five minutes and participants were asked to freely move during the session. Figure 11 shows room layouts and device placements. The purpose of multiple placements are to investigate what is a realistic upper-bound on the classification performance of a single device under different training and testing conditions. MIMO CSI data were sampled at a nominal 500Hz rate. A stationarity period of 50ms was used and the subspace update was performed in a sliding window fashion with no overlap as in Equation (16).
b: Pipeline
For the occupancy detection, we developed an inference pipeline using a long short-term memory (LSTM) classifier. We chose LSTM as a classifier to leverage spatio-temporal variation of our differential unitarity features from subspace tracking. In the current implementation, we adopted two hidden LSTM layers, each of which has 50 nodes. Some prior presence detection work dwells on the signal much longer with distribution-based approach while using a diversity of frequency channels [20]. In contrast, we define a short 5 seconds inference window and with no channel frequency diversity. In this paper, our objective is to showcase how to specialize various subspace tracking-based applications rather than demonstrate best-in-class performance.
c: Comparison
For comparison, we implemented the baseline pipeline from [21]. It takes temporal variations of CSI data as feature values and uses linear discriminant analysis as a classifier.
d: Training and Test
For training, we selected a receiver located at a diagonal position of the transmitter, thereby maximizing the RF coverage. Accordingly, we have 11 different models. For the evaluation, we considered three environment variations, same, minor, and major. Same refers where the data from the same receiver, i.e., same placement, is used both for training and test. Minor and major use the CSI data from different receivers placed in the same room and different room, respectively. Same represents the upper bound of the performance that the inference logic can achieve in a specific environment. Minor and major show how robust the inference pipeline is in unseen environments.
2) Experimental Results
We investigate how the subspace tracking effectively mitigates the environmental effect of CSI on the occupancy detection. Figure 12a shows the box plots of the accuracy of 11 models for different variations. Although the accuracy of both pipelines is similar in same variation, the subspace tracking retains more competitive accuracy as we introduce minor and major environmental changes compared to the baseline. The accuracy in same variation is 89% and 88% for the subspace tracking and baseline, respectively. However, in minor and major variations, the subspace tracking decreases to 82% and 78%, whereas the baseline does to 73% and 62%.
We further investigate the effect of the number of classes on the occupancy detection on major variation. Figure 12b shows the box plots of the accuracy while varying the number of classes. 2 classes represent presence detection, i.e., empty or occupied. 3 and 4 classes are for the number of people as [0, 1, 2+] and [0, 1, 2, 3], respectively.5 The results show that the subspace tracking achieves reasonable performance even with higher number of classes. Our pipeline shows 85%, 70% and 65% for 2, 3, and 4 classes, respectively, whereas the baseline does 62%, 49%, and 43%.
B. Physical Activity
We use the activity recognition dataset available publicly by Yousefi et al. [1] to demonstrate the applicability of our subspace tracking technique on the problem domain of activity classification. The dataset is comprised of 6 single-user activities; namely, standing up, sitting down, lying down, falling, walking, and running. SIMO CSI data from three receiving multiple antennae is sampled at 1 ksps rate. We choose a stationarity period of 25ms and proceed to update the covariance matrix with 95% CSI overlap from previous stationarity period with
In a preliminary evaluation, we build a simple classifier based around dynamic time warping (DTW) and K-nearest neighbors. This is applied to a single-dimensional Dy slope differential unitarity (see figure 7b). We evaluate our classifier against the author’s mid-range hidden Markov model (HMM) which uses a combination of PCA and the short-time Fourier transform (STFT) time-frequency localization pre-processing. The results are shown in figure 13. Capability-wise, there is an asymmetry in that featurization based around 2D STFT + HMM is in principle far stronger than our 1D DTW + K-nearest. Nonetheless, on the whole, the performance of our simple classifier is not far from that reported by Yousef et al, albeit with different characteristics. For instance, while 2D STFT + HMM outperforms our 1D DTW + K-nearest in nearly all activities, our fall activity performance is substantially better. We attribute this to the high acceleration content of fall which our slope metric is able to capture easily as shown in figure 9a due to native acceleration sensing. Perhaps our pairwise metric with 2D time-frequency localization would perform much better. Since our focus in this paper is to only showcase a generic formal featurization suited for many applications, we leave improved classification for future work.
Activity recognition performance. (a) Ours:
Discussion
In this section, we provide commentary on the limitations of our work and discuss relevance to other wireless systems, thereby exposing items of future research.
A. Applicability to Other 802.11 Standards
Physical propagation behavior will differ depending on the frequency band. Such behavior will be mirrored when viewed through the lens of the signal and noise subspaces. Our proposed featurization provides sensing primitives to track the variations in propagation dynamics that are induced by human motion. However, it is the role of the machine learning (ML) component to capture such behavior in a robust sensing model. Thus, when operating within different frequency bands, it is important to ensure that the back-end ML component is trained for the respective human-modulated propagation behavior corresponding to that specific band. Our experimental results in this paper are for the 5GHz Wi-Fi band with 40MHz bandwidth. Nonetheless, other wireless standards—such as 802.11ah operating in the sub-1GHz band and 802.11ad/ay operating in the 60GHz band—could benefit from identical featurization, albeit after specializing the back-end ML component to capture their individual propagation characteristics as a function of human motion. Moreover, we have shown in Section IV that the magnitude of our differential subspace tracking behaves identically irrespective of the representation of the channel response, be it in time or frequency. This means that both the single-carrier and OFDM variants of WiGig would benefit from our subspace-based featurization. It is also worth pointing out that in relation to WiGig, 60GHz frequencies are quasi-optical and are less able to diffract around objects. The subspace will mirror this behavior; however, increased coverage of the environment may be possible by considering the beam training procedure that 802.11ad/ay implements. Specifically, recent work has shown that such beam training procedure from infrastructure access points can be used to localize a mobile user [22]. It would be interesting in this particular example to see if tracking the subspace would allow for inferring finer-grained details on the nature of the mobile node’s movement. OFDMA systems such as 802.11ax can also benefit from the proposed subspace tracking; however, care should be taken to handle instances of transition in user-assigned subcarriers and their implications on the subspace.
B. ML Model Coverage and Vectors of Variation
There are many variables that impact the robustness of the back-end ML model. We call these the vectors of variation of the ML model. Exhaustive training across these vectors of variation is needed for sufficient coverage of the sampling space in order to ensure the ML model generalizes in the real-world. One such vector of variation is that arising from the individualized way in which different users perform activities. Broadly, there are two methods in prior art for dealing with such variations: design-based and learning-based. In design-based methods, hand-crafted features by an expert designer—such as careful frequency binning in [23] and coarser wavelet spectral bins in [5]—are engineered to absorb the expected variations in the real-world. In contrast, learning-based approaches rely on automatic coverage of these natural variations by the inference component through the sheer amount of empirical data used for training. In this paper, we focused on a formal and interpretable low-dimensional featurization of the wireless channel, with our evaluation (cf. fiugre 12) falling under the latter learning-based approach.
C. Axes of Resolution
The performance of sensing applications built atop channel tracking is fundamentally limited by the spatio-temporal resolutions of channel measurements. Specifically, the utilized bandwidth and number of antennae have a large bearing on what can be perceived unambiguously in the environment i.e. without over-fitting inference. To see this, consider the environmental imaging capability of the covariance
Conclusion
In this paper, we formalize the problem of Wi-Fi-based human sensing and cast it as a channel signal subspace tracking task. We demonstrate the equivalence of the two problems. We posit the optimality of such formulation citing prior established work from wireless literature. We conclude by providing evidence for the applicability of our subspace tracking across two usage scenarios: presence detection and activity recognition with promising early results. Future work will focus on machine learning classification using our subspace-based featurization.
ACKNOWLEDGMENT
The authors would like to thank Howard Huang for the helpful comments on this manuscript.