What’s Going to Happen this Year in the Data World

"If we wish to foresee the future of mathematics, our proper course is to study the history and present condition of the science." Henri Poncairé.

By Favio Vazquez, Founder at Ciencia y Datos on May 14, 2019 in Advice, AI, Big Data, Data Science, Deep Learning

comments

Introduction

If you are immerse in the data world it’s probable that you’ve seen a bunch of articles, blog posts and news about what’s going to happen this and the upcoming years, the trends and expectations.

I read lots of them, and if you want to do it to go to the end of the article and you’ll find it there. But in here I want to give a quick overview of what’s going on right now, and analyze the different trends people are talking about to see what’s more likely to happen.

The most common trends

If you take a search the term “Data Science” you’ll find around 54 million results, that’s a lot. And the interest in the field has been growing over the years:

But data science is huge right now, and it means different things for different people. If you take a look of the articles and news, the biggest trends are:

There are more but these are the biggest ones. And there’s also a lot to say about each one of them, but I want to focus on two (2) right now. Automation and graphs.

Automation and Graphs (The Data Fabric)

https://towardsdatascience.com/the-data-fabric-for-machine-learning-part-1-2c558b7035d7

If you’ve been following my research, one of the things I’m most interested right now is the data fabric. Remember that my definition of the data fabric is:

The Data Fabric is the platform that supports all the data in the company. How it’s managed, described, combined and universally accessed. This platform is formed from an Enterprise Knowledge Graph to create an uniform and unified data environment.

And there are two important points I want to make here, the data fabric is formed by the enterprise knowledge graph and it should be as automated as possible.

One of the articles that mention this very clearly is this one from Gartner:

Gartner Identifies Top 10 Data and Analytics Technology Trends for 2019
Augmented analytics, continuous intelligence and explainable artificial intelligence (AI) are among the top trends in…www.gartner.com

They say:

The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science.

and also:

Through 2022, bespoke data fabric designs will be deployed primarily as a static infrastructure, forcing organizations into a new wave of cost to completely re-design for more dynamic data mesh approaches.

It’s very clear that graph concepts and the data fabric will be more common every year for data companies. But what about automation?

There are lots of avances in automation regarding machine learning, deep learning and deployment. But as I said before data is an important asset (maybe the most important one) for companies right now. So before you can apply machine learning or deep learning, at all, you need to have it, know what you have, understand it, govern it, clean it, analyze it, standardize it (maybe more) and then you can think of using it.

We need automation for data storage, data munging, data exploration, data cleansing and all the things we actually spend a lot time doing. You can say that tools like DataRobot can offer you those things, but in my experience there’s much more to do in this field.

That’s why my bet it’s that semantic technologies are the way to go here. With them (like Anzo) you have automatic query generation (yep that’s a thing) and using them against the complex graph makes extracting features easy and eventually fully automated.

Also in one of my first articles on the subject I proposed this architecture:

https://towardsdatascience.com/deep-learning-for-the-masses-and-the-semantic-layer-f1db5e3ab94b

Where you have automation everywhere. And it’s easy to add more features on the go like explainable AI, continuous intelligence and more. In the same article people at Gartner mentioned:

Continuous intelligence is a design pattern in which real-time analytics are integrated within a business operation, processing current and historical data to prescribe actions in response to events. It provides decision automation or decision support.

Which is not a new term, if we think about streaming analytics, but I like the name. Actually what’s happening can be divided into two things: semantic technologies and automation.

Here’s the interest of semantic technologies over the last 5 years:

As we can see the data fabric is not that high, but I think that’s going to change very soon.

Conclusion

So if you want to be up to date you better start thinking on how to improve your data lakes into smart data lakes, also you want to automate your process, from data ingestion to model deployment and monitoring, and make use of graph technologies.

Again, as I mentioned before you should be using a graph database instead of something else if:

You have highly related data.
You need a flexible schema.
You want to have a structure and build queries that are more similar to way people think.

Instead if you have a highly structured data, you want to do a lot of grouping calculations and you don’t have that many relationships between your tables, then you may be better with a relational database.

The people at Cambridge Semantics created this great info-graph that let’s you know more about the data fabric universe:

Thanks for reading this, and hopefully the article gave you an idea about what to do in your personal business and life as data scientist.

References:

Data Science Trends for 2019
This year can be considered the booming of Artificial Intelligence (AI). Just look at the number of startups with the…towardsdatascience.com

Data Science Trends in 2019 - DATAVERSITY
When it comes to major Data Science trends to watch in 2019, last year's major trends continued from 2017 as the growth…www.dataversity.net

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019
As in the past, we bring you a roundup of predictions and analysis from experts. We have asked What were the main…www.kdnuggets.com

10 Big Data Trends to Watch in 2019
We seek ever more data for a good reason: it's the commodity that fuels digital innovation. However, turning those huge…www.datanami.com

5 Data and AI Trends for 2019 - InformationWeek
New year, fresh calendar page. If 2019 looks anything like 2018, you can bet that data, analytics, machine learning…www.informationweek.com

Graph Databases. What’s the Big Deal?
Continuing the analysis on semantics and data science, it’s time to talk about graph databases and what they have to…towardsdatascience.com

Anzo®
Until now, no technology has been able to deliver a Semantic Layer at enterprise scale - with security, governance and…www.cambridgesemantics.com

Bio: Favio Vazquez is a physicist and computer engineer working on Data Science and Computational Cosmology. He has a passion for science, philosophy, programming, and music. He is the creator of Ciencia y Datos, a Data Science publication in Spanish. He loves new challenges, working with a good team and having interesting problems to solve. He is part of Apache Spark collaboration, helping in MLlib, Core and the Documentation. He loves applying his knowledge and expertise in science, data analysis, visualization, and automatic learning to help the world become a better place.

Original. Reposted with permission.

Related:

What’s Going to Happen this Year in the Data World

Introduction

The most common trends

Automation and Graphs (The Data Fabric)

Conclusion

References:

More On This Topic

Latest Posts

Top Posts