DeepSense: A unified deep learning framework for time-series mobile sensing data processing

Compared to the state-of-art, DeepSense provides an estimator with far smaller tracking error on the car tracking problem, and outperforms state-of-the-art algorithms on the HHAR and biometric user identification tasks by a large margin.

Use an RNN to learn patterns across time windows

So now we have T combined sensor feature vectors, each learning intra-window interactions. But of course it’s also important to learn inter-window relationships across time windows. To do this the T feature vectors are fed into an RNN.

At this point I think we’re ready for the big picture.

Instead of using LSTMs, the authors choose to use Gated Recurrent Units (GRUs) for the RNN layer.

… GRUs show similar performance to LSTMs on various tasks, while having a more concise expression, which reduces network complexity for mobile applications.

DeepSense uses a stacked GRU structure with two layers. These can run incrementally when there is a new time window, resulting in faster processing of stream data.


Top it all with an output layer

The output of the recurrent layer is a series of T vectors, \{\mathbf{x}_{t}^{(r)}\}, one for each time window.

For regression-based tasks (e.g., predicting car location), the output layer is a fully connected layer on top of each of those vectors, sharing weights \mathbf{W}_{out} and bias term \mathbf{b}_{out} to learn \mathbf{\hat{y}}_{t} = \mathbf{W}_{out} . \mathbf{x}_{t}^{(r)} + \mathbf{b}_{out}.

For classification tasks, the individual vector are composed into a single fixed-length vector for further processing. You could use something fancy like a weighted average over time learned by an attention network, but in this paper excellent results are obtained simply by averaging over time (adding up the vectors and dividing by T). This final feature vector is fed into a softmax layer to generate the final category prediction.


Customise for the application in hand

To tailor DeepSense for a particular mobile sensing and computing task, the following steps are taken:

  • Identify the number of sensor inputs, K, and pre-process the inputs into a set of d x 2f x T tensors.
  • Identify the type of the task and select the appropriate output layer
  • Optionally customise the cost function. The default cost function for regression oriented tasks is mean squared error, and for classification it is cross-entropy error.

For the activity recognition (HHAR) and user identification tasks in the evaluation the default cost function is used. For car location tracking a negative log likelihood function is used (see section 4.2 for details).


Key results

Here’s the accuracy that DeepSense achieves on the car tracking task, versus the sensor-fusion and eNav algorithms. The map-aided accuracy column shows the accuracy achieved when the location is mapped to the nearest road segment on a map.

On the HHAR task DeepSense outperforms other methods by 10%.

And on the user identification task by 20%:

We evaluated DeepSense via three representative mobile sensing tasks, where DeepSense outperformed state of the art baselines by significant margins while still claiming its mobile-feasibility through moderate energy consumption and low latency on both mobile and embedded platforms.

The evaluation tasks focused mostly on motion sensors, but the approach can be applied to many other sensor types including microphone, Wi-Fi signal, Barometer, and light-sensors.

Original. Reposted with permission.