Exclusive Interview: Michael Brodie, Leading Database Researcher, Industry Leader, Thinker

We discuss the most important database research advances, industry developments, role of relational, NoSQL, Graph databases, Computing Reality, and more.

By Gregory Piatetsky, Apr 21, 2014.

I had a pleasure of working with Michael Brodie when we were both at GTE Laboratories in 1990s, where he was already a world-famous researcher and a department manager. I recently met him at another conference, and he graciously agreed to answer my questions for KDnuggets readers. Michael is still very sharp, very active, and busy - he answered these questions while flying from Boston to Doha, Qatar where he is advising Qatar Computing Research Institute. See also his recent report on Big Data and Privacy from MIT/White House workshop.

Michael BrodieDr. Michael L. Brodie has served as Chief Scientist of a Fortune 20 company, an Advisory Board member of leading national and international research organizations, and an invited speaker and lecturer. In his role as Chief Scientist Dr. Brodie has researched and analyzed challenges and opportunities in advanced technology, architecture, and methodologies for Information Technology strategies. He has guided advanced deployments of emergent technologies at industrial scale, most recently Cloud Computing and Big Data. In his Advisory Board roles Dr. Brodie addresses current and emergent strategic challenges and opportunities that are central to the charter and success of the organizations. As an invited speaker Dr. Brodie has presented compelling visions, challenges, and strategies for our emerging Digital Universe in over 100 keynote speeches in over 30 countries and in over 100 books and articles

Throughout his career Dr. Brodie has been active in both advanced, academic research and large-scale industrial practice attempting to obtain mutual benefits from the industrial deployment of innovative technologies while helping research to understand industrial requirements and constraints. He has contributed to multi-disciplinary problem solving at scale in contexts such as Terrorism and Individual Privacy, and Information Technology Challenges in Healthcare Reform.

Below is part 1 of our extensive interview. See also Part 2 and Part 3.

Gregory Piatetsky - Q1: You have started as a researcher in Databases (PhD from Toronto) and had a very distinguished and varied career spanning academia, industry, and government, in US, Europe, Australia, and Latin America over the last 25+ years. From your unique vantage point, what were 3 most important database research advances?

Michael Brodie: Three most important database research advances:

  1. Ted Codd’s Relational model of data (1970) is the most important database research advance as it launched what is now a $28 BN/year market still growing at 11% CAGR with over 215 RDBMSs on the market. More important to me it launched four decades of amazing research advances starting with query optimization (Selinger) and transactions (Gray) and innovation that has probably grown at 20% CAGR.
  2. The next most important research advance or stage was a change in perspective that specific domains require their own DBMS such as graph databases, array stores, document stores, key-value stores, NoSQL, NewSQL, and many more to come. DB-Engines lists twelve DBMS categories thus bumping the database world from managing 8% of the world’s data to about 12% but due to the growth of non-database databases back to 10%. Soon, due to the role of data in our digitized world there will be data management systems for many more domains. While this is amazingly cool, how do we solve multi-disciplinary (multi-data domain) problems in a consistent rather that disjoint way?
  3. The next most important research advance is just emerging and is mind blowing. I call it Computing Reality, acknowledging that every datum (every real world observation) is not definitive but probabilistic. Unlike conventional databases and more like reality, Computing Reality has no single version of truth. How do we model such worlds, more realistic worlds and compute over them? The simple answer is that it is already in Big Data sources. There are many related attempts to address Computing Reality including social computing, probabilistic computing, probabilistic databases, Open Worlds in AI, Web Science, Approximate Computing, Crowd Computing, and more. Perhaps this will be the next generation of computing.

GP - Q2: What about most important database industry developments?

MB: Alas the database industry, like all industries, has a legacy problem that stifles innovation. It has taken over 30 years to emerge from the relational era. The most important recent database industry development came from outside the database industry, it is Big Data and its marketing arm called MapReduce and its data sidekicks, Hadoop and NoSQL. No Hadoop Frankly, the database industry has been insular and protected its relational turf for FAR too long. Smart folks at Yahoo!, Google and other places saw value in data, non-database data, and thus emerged MapReduce, Hadoop, and NoSQL- generally crappy database ideas but it woke up the database industry. Hadoop and NoSQL are growing in demand. In time it will be seen that they are amazing for a very specific problem domain, embarrassingly parallel problems, but it is a money pit for everything else. The importance of MapReduce is that it forced the database industry to get out of their hammocks.

GP - Q3: What is the role of Relational Databases, NoSQL databases, Graph databases, and other databases today?

Relational Databases have two extremely well established roles. Conventional row stores serve the OLTP community as the backbone of enterprise operations. These blindingly fast transaction processors are moving in-memory. OLTP stores are modest in number and size (< 1 TB) growing and declining in lock step with business growth and decline. Column stores, OLAP, are the backbone of data warehouses and until recently business intelligence. In general there are huge numbers of these, often of very large size in the Petabyte and Exabyte range. This is where Big Data battle lines are being drawn. What fun!!

This is also where we turn from polishing the relational round ball (Are We Polishing a Round Ball? Panel Abstract. Michael Stonebraker. ICDE, page 606. IEEE Computer Society, 1993) and focus on the other dozen or so other DBMS categories. Taking over is relative; none of the 12 other categories has more than 3% of the database market. Graph databases serve graph applications like networking in communications, telecom, social networks, and of course NSA applications! But what is wonderful about these emerging classes of data-domain specific DBMSs is that we are only now discovering the rich use cases that they serve.

The use cases define the DBMSs and the DBMSs help formulate the use cases. SciDB is a superb example of managing scientific data and computation at scale. It is awkward for both communities – database folks who done speak linear algebra or matrices, and scientists who only speak R. Exciting times. For a little fun look at the database-engines list.

MB Notes: DB-Engines lists 216 different database management systems, which are classified according to their database model (e.g. relational DBMS, key-value stores etc.). This pie-chart shows the number of systems in each category. Some of the systems belong to more than one category.
Popularity changes per category
Popularity changes per category, April 2014, over 1 year
  • Graph DBMS – growing dramatically 3.5X
  • Wide column stores – 2X
  • Document stores 2X
  • Native XML DBMS – 1.5X
  • Key-value stores – 1.5X
  • Search engines – 1.5X
  • RDF stores – 1.5X
  • Object oriented DBMS - flat
  • Multivalue DBMS - flat
  • Relational DBMS - flat

GP: 4. You have held an amazing variety of positions in academy, industry, government organization, VC firms, and start-ups, in US, Brazil, Canada, Australia, and Europe. Which 3 positions were most satisfying to you and why?

MB: What a great question. Thank you for asking because it caused me to think about what I have really enjoyed over 40 years. Somehow CSAIL at MIT and the Faculty of Computing and Communications at EPFL jump to mind.

Scary There are scary smart people at those places. Like climbing mountains it both scares and exhilarates me. To be frank my jobs at big enterprises in hindsight are confusing. I guess I was window dressing because my role did not feel like it had impact. So getting motivated and scared at MIT and EPFL are probably top, so there’s number one. Why? Just look down 5,000 feet and ask why am I here?

Second is a combination of Advisory roles at US Academy of Science, DERI, STI, ERCIM, Web Science Trust, and others because they gave me a sense of collaborating, challenging, and contributing. How cool is that?

Third would be working at startups like Data Tamer and Jisto. Imagine waking up in the morning and thinking you might change the world. That requires that I conceive the world not just differently, but so that it solves someone’s REAL problem. Even more cool.

Here is Part 2 of our extensive interview.