KDnuggets Home » News » 2017 » Oct » Opinions, Interviews » Data Science –The need for a Systems Engineering approach ( 17:n39 )

Data Science –The need for a Systems Engineering approach


We need a greater emphasis on the Systems Engineering aspects of Data Science. I am exploring these ideas as part of my course "Data Science for Internet of Things" at the University of Oxford.



Background

The discipline of Systems Engineering originated at Bell Labs in the 1940s and was popularized by the deployment of complex programs in the Military and at NASA

As per the International Council on Systems Engineering (INCOSE), Systems Engineering is an engineering discipline whose responsibility is creating and executing an interdisciplinary process to ensure that the customer and stakeholder's needs are satisfied in a high quality, trustworthy, cost efficient and schedule compliant manner throughout a system's entire life cycle. 

Systems engineering covers aspects such as:  Interdisciplinary functions,  End to end management , Complex systems, lifecycle management, formal Requirements engineering, reliability, logistics etc. Systems Engineering also takes a ‘System of Systems’ approach. Systems engineering problems are solved using the following steps: (as per INCOSE):  State the problem, Investigate Alternatives, Model the system, Integrate, Launch the system, Assess performance, Re-evaluate. They often also involve working with systems patterns as part of the basic ideas of systems thinking.

So, how do these ideas relate to Data Science? As Data Science becomes more complex, the nature of the problem requires a more holistic approach/systems thinking approach.  This applies especially to areas like Deep Learning

The need for systems thinking in Data Science

Here are two examples of the use of Systems thinking in Data Science

1  Machine Learning plays only a small part in real world applications

In the paper Hidden  Technical  Debt in Machine  Learning Systems, a team from Google says that only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.

Hidden Debt Machine Learning Systems

The paper goes on to say

  • ML systems are complex because they have all the problems of maintaining traditional code in addition to a set of ML specific issues.
  • Complex Models Erode Boundaries Unfortunately, it is difficult to enforce strict abstraction boundaries for machine learning systems by prescribing specific intended behavior. Indeed, ML is required in exactly those cases when the desired behavior cannot be effectively expressed in software logic without dependency on external data.
  • Map-Reduce is a poor abstraction for iterative ML algorithms.
  • Data Testing, Reproducibility and Process Management all add to the complexity of ML systems

2) TensorLayer

TensorLayer is a new, open source initiative that addresses the growing complexity of deploying real-world Deep Learning models end to end.  TensorLayer: A Versatile Library for Efficient Deep  Learning Development addresses a number of Systems Engineering

The tensorlayer architecture is below

Tensorlayer Architecture

As you can see, Machine learning is only one component. If you consider an end to end (systems engineering) approach  - tensorlayer provides four modules: 

  1. A layer module that provides reference  implementation of neuron  layers which can be flexibly interconnected to architect  neural networks,
  2. A model module that can help manage the intermediate  states incurred throughout a model life-cycle,
  3. A dataset module that manages training data which  can be used by both offline and online  learning  systems, and
  4. A workflow module that supports asynchronous scheduling and failure recovery for concurrent training jobs.

This overcomes the problem of growing interactivity in Deep Learning where developers have to spend many cycles on integrating components for experimenting neural networks, managing intermediate training states, organizing training-related data, and enabling hyper- parameter tuning in responses to varied events. This makes deployment of integrated applications like DRLs, GANs, model cross-validation and hyper parameter optimization easier.

Conclusion

As Data Science matures as a discipline, I see the need to address more complex and holistic problems. To achieve these goals, we can learn from existing disciplines like Systems Engineering. I discuss these issues in the Data Science for Internet of Things at the University of Oxford.

Related: