KDnuggets Home » News » 2018 » Feb » Opinions, Interviews » Graph Databases Burst into the Mainstream ( 18:n08 )

Graph Databases Burst into the Mainstream

What do Amazon, Facebook, Google, IBM, Microsoft and Twitter have in common? They're all adopters of graph databases - a hot technology that continues to evolve.

By Yu Xu, CEO and founder of TigerGraph.

Whether for Customer Analytics, Fraud Detection, Risk Assessment or another real-world challenge, the ability to quickly and efficiently explore, discover and predict complex relationships is a huge competitive differentiator for businesses today. Getting it done involves more than merely connected data – it’s about real-time and up-to-date correlation, detection and discovery. Organizations need to be able to transform structured, semi-structured and unstructured data and massive enterprise data silos into an intelligent, interconnected data network that can reveal critical patterns and insights to support business goals.

This elemental pain point – the need for real-time analytics for enterprises with enormous volumes of data – is fueling graph databases’ emergence as a mainstream technology being embraced by companies across a broad range of industries and sectors.

Since seeing early adoption by companies including Twitter, Facebook and Google, the graph database market has been heating up. Giant cloud service providers Amazon, IBM and Microsoft have added graph databases in the last two years, validating the industry’s growing interest in graph technology for easy and natural data modeling, easy-to-write queries to solve complex problems, and fast insights from interconnected data.

Graph Database Landscape 600

In fact, graph databases are the fastest growing category in all of data management, according to consultancy DB-Engines.com. A recent Forrester survey showed that 51 percent of global data and analytics technology decision-makers are employing graph databases.

All other database types (RDBMS, data warehousing, document DB, and key-value DB) started primarily on-premises and were welcomed before database-as-a-service was established. Now that the large cloud service providers are going all in on graph technology, graph database adoption is likely to keep accelerating.

Organizations are embracing the power of the graph for a simple reason: By storing data in a graph format -- including nodes, edges and properties -- graphs offer distinct advantages over other databases, including better and faster queries and analytics, simpler and more natural data modeling, simultaneous support for real-time updates and queries, and flexibility for evolving data structures.

Many different kinds of graph database offerings are available today and each offers unique features and capabilities, so it’s important to understand the differences.

Operational Graph Databases are products suitable for a broad range of enterprise-level transactional applications. They’re typically native graph stores or built on a NoSQL platform, and are focused on ACID transactions and operational analytics, with no absolute requirement for indexes. These databases include: Titan, JanusGraph, OrientDB and Neo4j.

The Resource Description Framework (RDF, sometimes known as triple stores) is a family of World Wide Web Consortium specifications originally designed as a metadata model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats.

A number of graph database vendors have based their knowledge graph technology on RDF, including: AllegroGraph, Virtuoso, Blazegraph, Stardog, and GraphDB.

Multi-Modal Graphs encompass databases designed to support different model types. For example, a common possibility is a three-way option of document store, key value store or RDF/graph store. Examples include Microsoft Azure Cosmos DB and ArangoDB.

Analytic Graphs, according to Bloor Research, focus on solving “known known” problems where both entities and relationships are known, or on “known unknowns” and even “unknown unknowns.” Examples include: Apache Giraph and Turi (formerly GraphLab, now owned by Apple).

As the technology continues to evolve, the performance gap is closing between what early graph database systems have been able to offer and what’s needed to support today’s big data challenges. Anew category of graph databases, which I call the real-time big graph,is designed to better deal with massive data volumes and data creation rates and to provide real-time analytics.

Real-time big graphs enable real-time large graph analysis with both 100M+ vertex or edge traversals/sec/server and 100K+ updates/sec/server, regardless of the number of hops traversed in the graph. They also provide real-time graph analytic capabilities to explore, discover and predict very complex relationships. This represents Real-Time Deep Link Analytics - achieved utilizing three to 10+ hops of traversal across a big graph, along with fast graph traversal speed and data updates. My company, TigerGraph, is hard at work in this category.

As organizations embrace the power of the graph, knowing the available offerings and their advantages is important to determining the best option for a particular use case. Graph technology is moving into the next-generation and their capabilities for the interconnected data network will become only more compelling.

Bio: Dr. Yu Xu is the founder and CEO of TigerGraph, the world’s first native parallel graph database. Dr. Xu is an expert in big data and parallel database systems and graph databases and has 26 patents in parallel data management and optimization. Prior to founding TigerGraph, Dr. Xu worked on Twitter’s data infrastructure for massive data analytics, and as Teradata Hadoop architect in charge of big data initiatives.