ODBMS.org, by Roberto V. Zicari on May 2, 2012
I have interviewed Mike Stonebraker, serial entrepreneur and professor at MIT. In particular, I wanted to know more about his last endeavor, VoltDB. RVZ
Q1. In your career you developed several data management systems, namely: the Ingres relational DBMS, the object-relational DBMS PostgreSQL, the Aurora Borealis stream processing engine(commercialized as StreamBase), the C-Store column-oriented DBMS (commercialized as Vertica), and the H-Store transaction processing engine (commercialized as VoltDB). In retrospective, what are, in a nutshell, the main differences and similarities between all these systems? What are they respective strengths and weaknesses?
Stonebraker: In addition, I am building SciDB, a DBMS oriented toward complex analytics. I believe that "one size does not fit all". I.e. in every vertical market I can think of, there is a way to beat legacy relational DBMSs by 1-2 orders of magnitude. The techniques used vary from market to market. Hence, StreamBase, Vertica, VoltDB and SciDB are all specialized to different markets. At this point Postgres and Ingres are legacy code bases.
Q2. In 2009 you co-founded VoltDB, a commercial start up based on ideas from the H-Store project. H-Store is a distributed In Memory OLTP system. What is special of VoltDB? How does it compare with other In-memory databases, for example SAP HANA, or Oracle TimesTen?
Stonebraker: A bunch of us wrote a paper "Through the OLTP Looking Glass and What We Found There" (SIGMOD 2008). In it, we identified 4 sources of significant OLTP overhead (concurrency control, write-ahead logging, latching and buffer pool management).
Unless you make a big dent in ALL FOUR of these sources, you will not run dramatically faster than current disk-based RDBMSs. To the best of my knowledge, VoltDB is the only system that eliminates or drastically reduces all four of these overhead components. For example, TimesTen uses conventional record level locking, an Aries-style write ahead log and conventional multi-threading, leading to substantial need for latching. Hence, they eliminate only one of the four sources.
... Q14. You are currently working on science-oriented DBMSs and search engines for accessing the deep web. Could you please give us some details. What kind of results did you obtain so far?
Stonebraker: We are building SciDB, which is oriented toward complex analytics (regression, clustering, machine learning, ...). It is my belief that such analytics will become much more important off into the future. Such analytics are invariably defined on arrays, not tables. Hence, SciDB is an array DBMS, supporting a dialect of SQL for array data. We expect it to be wildly faster than legacy RDBMSs on this kind of application. See SciDB.org for more information.