3 Challenges for Companies Tackling Data Science
From new technology to workflows, we outline three of the more common problems and how businesses can overcome them.
By Seth Deland, Product Marketing Manager of Data Analytics, MathWorks
As data science continues to grow and evolve, new challenges are cropping up that companies must address as they move projects forward. From new technology to workflows, we outline three of the more common problems and how businesses can overcome them.
Learning Curve for New Technology
Challenge: The pace of innovation in the data science space is very fast, and each new piece of technology has its own learning curve. In many cases, the original technology is developed by computer scientists, with the intended audience also being someone with very strong programming skills. These software packages are implemented in many different programming languages, so the learning curve is very steep for those who do not write code full-time.
Solution: Engineers and scientists who do not program full-time should look for tools that enable them to get up and running quickly, preferably within computational platforms that they’re already familiar with. Point-and-click apps like those found in MATLAB can serve as an easy starting point for learning the technology. Beyond that, a programmatic interface is typically required to fine-tune analytics to improve robustness and accuracy. Mature programming tools will have consistent APIs that make it easy to swap in different data science techniques. If businesses are serious about data science, they should also look for training courses that can help employees ramp up much faster than learning from trial and error.
The large amount of cutting-edge research in the data science space creates a wave of new technologies that have the potential to disrupt. However, in the wake of that wave, generalized tools arise that engineers and scientists can use in novel applications.
Engineer or Data Scientist: Who Does What?
Challenge: Organizations are trying to determine “who is the right team to do this work?” While data scientists often have strong backgrounds in machine learning, they are often new to or unfamiliar with the ins and outs of the business and its products. Engineering and science groups have knowledge of the business and its products but may not be experienced with machine learning.
Solution: A common compromise is to pair up engineers who have domain knowledge with data scientists to leverage each of their strengths, but this may not be possible in many cases because there are far more domain experts than data scientists. Another solution is to adopt tools that simultaneously lower the bar for machine learning (for the domain experts) and provide flexibility and extensibility (for the data scientists). In practice, this means adopting a tool that has both a graphical interface (i.e., apps) and a programming language, as well as the capability to integrate with a variety of other tools.
Even as data science groups grow within organizations, the data science work will continue to be done by both engineers with domain knowledge and data scientists. Both will play an important role in the successful adoption of data analytics by the business, so creating an environment where they can collaborate is key.
Where Does an Analytic End Up?
Challenge: A successfully developed analytic or machine learning model has limited value to the business if it cannot be integrated with the business’s systems, products, and services. This could mean integrating the analytic with servers maintained by the IT organization or deploying the analytic to embedded devices (such as sensors or edge nodes in an Internet of Things system).
Traditionally, the analytic is developed in a tool that’s suitable for research and development, but not for running the analytic in production, so the analytic must be recoded into a different programming language before it can be deployed. This process typically takes several weeks to months and can introduce bugs.
Solution: Platforms for developing analytics offer ways to package the algorithm to run in different production environments. Look for a tool that provides integration paths and application servers for use with common IT systems, as well as the ability to target embedded devices. For example, MATLAB provides deployment paths for integrating analytics with programming languages commonly used in IT systems (e.g., Java and .NET), as well as for converting analytics to standalone C code that can be run on embedded devices. Both of these deployment options are accessed through point-and-click interfaces, further reducing the time spent on conversion. By automating the process of converting the analytic to run in production systems, these tools significantly reduce the time it takes to get a new analytic in production.
Technologies that enable domain experts to apply machine learning and other data science techniques to their work are here to stay. They provide exciting opportunities for teams to innovate—in both their design workflows and the products they create. It does not appear that the shortage of data scientists will be addressed anytime soon. Domain experts will play a crucial role in filling this gap. Their knowledge of the business and the products it produces positions them well to find innovative ways to apply data analytics technologies.
Learn more about the topics covered in this blog post:
- Classification Learner App (try in browser): Train and compare classification models using an interactive MATLAB app. No license needed, no download, no waiting.
- Using Analytics and Machine Learning to Build Intelligent Products and Services (white paper): Learn how smart, real-time systems can be powered by predictive algorithms and controls in real-time applications.
- Enterprise IT Integration of MATLAB Programs (webinar): Find out how you can integrate MATLAB programs into enterprise applications quickly and easily.