Data Science –The need for a Systems Engineering approach
We need a greater emphasis on the Systems Engineering aspects of Data Science. I am exploring these ideas as part of my course "Data Science for Internet of Things" at the University of Oxford.
The discipline of Systems Engineering originated at Bell Labs in the 1940s and was popularized by the deployment of complex programs in the Military and at NASA
As per the International Council on Systems Engineering (INCOSE), “Systems Engineering is an engineering discipline whose responsibility is creating and executing an interdisciplinary process to ensure that the customer and stakeholder's needs are satisfied in a high quality, trustworthy, cost efficient and schedule compliant manner throughout a system's entire life cycle. ”
Systems engineering covers aspects such as: Interdisciplinary functions, End to end management , Complex systems, lifecycle management, formal Requirements engineering, reliability, logistics etc. Systems Engineering also takes a ‘System of Systems’ approach. Systems engineering problems are solved using the following steps: (as per INCOSE): State the problem, Investigate Alternatives, Model the system, Integrate, Launch the system, Assess performance, Re-evaluate. They often also involve working with systems patterns as part of the basic ideas of systems thinking.
So, how do these ideas relate to Data Science? As Data Science becomes more complex, the nature of the problem requires a more holistic approach/systems thinking approach. This applies especially to areas like Deep Learning
The need for systems thinking in Data Science
Here are two examples of the use of Systems thinking in Data Science
1 Machine Learning plays only a small part in real world applications
In the paper Hidden Technical Debt in Machine Learning Systems, a team from Google says that only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surrounding infrastructure is vast and complex.
The paper goes on to say
- ML systems are complex because they have all the problems of maintaining traditional code in addition to a set of ML specific issues.
- Complex Models Erode Boundaries Unfortunately, it is difficult to enforce strict abstraction boundaries for machine learning systems by prescribing specific intended behavior. Indeed, ML is required in exactly those cases when the desired behavior cannot be effectively expressed in software logic without dependency on external data.
- Map-Reduce is a poor abstraction for iterative ML algorithms.
- Data Testing, Reproducibility and Process Management all add to the complexity of ML systems
TensorLayer is a new, open source initiative that addresses the growing complexity of deploying real-world Deep Learning models end to end. TensorLayer: A Versatile Library for Efficient Deep Learning Development addresses a number of Systems Engineering
The tensorlayer architecture is below
As you can see, Machine learning is only one component. If you consider an end to end (systems engineering) approach - tensorlayer provides four modules:
- A layer module that provides reference implementation of neuron layers which can be flexibly interconnected to architect neural networks,
- A model module that can help manage the intermediate states incurred throughout a model life-cycle,
- A dataset module that manages training data which can be used by both offline and online learning systems, and
- A workflow module that supports asynchronous scheduling and failure recovery for concurrent training jobs.
This overcomes the problem of growing interactivity in Deep Learning where developers have to spend many cycles on integrating components for experimenting neural networks, managing intermediate training states, organizing training-related data, and enabling hyper- parameter tuning in responses to varied events. This makes deployment of integrated applications like DRLs, GANs, model cross-validation and hyper parameter optimization easier.
As Data Science matures as a discipline, I see the need to address more complex and holistic problems. To achieve these goals, we can learn from existing disciplines like Systems Engineering. I discuss these issues in the Data Science for Internet of Things at the University of Oxford.
- The dynamics between AI and IoT
- Top /r/MachineLearning Posts, February: Oxford Deep NLP Course; Data Visualization for Scikit-learn Results
- Data Science for Internet of Things (IoT): Ten Differences From Traditional Data Science