Creating a methodology for Data Science for IoT (IoT Analytics)
While there is no specific methodology to solve Data Science for IoT (IoT Analytics) problems, perhaps it is time to draft one.
Introduction
We often encounter this problem in my teaching Data Science for Internet of Things:
There is no specific methodology to solve Data Science for IoT (IoT Analytics) problems.
This leads to some initial questions:
- Should there be a distinct methodology to solve Data Science problems for IoT?
- Are IoT problems for Data Science unique enough to warrant a specific approach?
- What existing methodologies should we draw upon?
On one hand, A Data Science for IoT problem is a typical Data Science problem. On the other hand, there are some unique considerations to IoT – for example in the use of Hardware, High Data volumes, Use of CEP(Complex event processing), impact of verticals(like automotive), Impact of streaming data etc.
Background and inspiration
Some initial background:
Data mining has well known methodologies such as Crisp DM. Hilary Mason and others have also proposed specific methodologies for Data Science . Kaggle problems have a specific approach to solving them . With techniques like PFA(Portable format for Analytics) provide a way of formalizing and moving Analytics models.
All these strategies also apply to IoT. IoT itself has methodologies like Ignite IoT – but these do not cover IoT analytics in detail.
A methodology for IoT analytics(Data Science for IoT) should cover the unique aspects of each step in Data Science. For example: It is more than the choice of the model family. The choice of the model family (ANN, SVM, Trees, etc) is only one of the many choices to make – Others include:
a) Choice of the model structure - optimisation methodology (CV, Bootstrap, etc)
b) Choice of the model parameter optimisation algorithm (joint gradients vs. conjugate gradients )
c) Preprocessing of the data (centring, reduction, functional reduction, log-transform, etc.)
d) How to deal with missing data (case deletion, imputation, etc.)
e) How to detect and deal with suspect data (distance-based outlier detection, density-based, etc.)
f) How to choose relevant features (filters, wrappers, embedded method ?)
g) How to measure prediction performances (mean square error, mean absolute error, misclassification rate, lift, precision/recall, etc.)
Source: Methodology and standards for data analysis with machine learning tools Damien Francois ∗
The methodology could also cover:
- Exploratory analysis of data
- Hypothesis testing (“Given a sample and an apparent effect, what is the probability of seeing such an effect by chance?” )
- and other ideas ..
Building on the above, we need an Open, end-to-end, step by step methodology to solve IoT Analytics/Data Science for IoT problems
In addition, the methodology would need to consider the unique aspects of IOT. For example:
a) Complex event processing especially using Apache Spark for CEP
b) Deep learning (because we consider Cameras as sensors)
c) Anomaly Detection: Consider Anomaly detection (a typical IoT analytics scenario). There are many considerations: What is the triggering event, How much has the machine deviated from the plan, What is the root cause of the bottleneck, Are there any external factors affecting the system performance, How do I know that I should trust IOT data? Is there a recommended plan of action? How is the Data visualized? Does the Data have missing elements? How do we detect failure in other processes? (Anomaly detection adapted from Dr Vinay Mehendiratta)
In addition, IoT vertical domains have special considerations: Smart Grid, Smart cities, Smart energy, Automotive, Smart factory, Mobile, Wearables, Smart home etc.
For example:
- Modelling energy prices
- Classifying step using machine learning
- Bus routing using mobile phone data
- Linear and non-linear regression models to predict global temperature and weather prediction
- etc
Currently, this is an evolving thought process being developed as a part of the Data Science for IoT course. We intend to create it as an open methodology – starting with the question: What is common across these IoT analytics problems and how can we adapt existing Data Science techniques to solve IoT analytics problems?
Over the next few weeks, we are conducting a survey and developing the methodology
If you are interested in participating and knowing more, please sign up to our mailing list and download our papers or contact me at ajit.jaokar at futuretext.com
Related:
- Data Science of IoT: Sensor fusion and Kalman filters, Part 1
- Data Science of IoT: Sensor fusion and Kalman filters, Part 2
- Data Science for Internet of Things – practitioner course