Interview: James Taylor, Salesforce on Phoenix + HBase – The Future of Big Data

We discuss the advantages of Phoenix, upcoming features, soon coming-up support for transactions, trends, advice, and more.

Twitter Handle: @hey_anmol

James_Taylor James Taylor is an architect at Salesforce in the Big Data Group. He founded the Apache Phoenix project and leads its on-going development efforts. Prior to Salesforce, James worked at BEA Systems on projects such as a federated query processing system and a SQL-based complex event programming platform, and has worked in the computer industry for the past 20+ years at various start-ups. He lives with his wife and two daughters in San Francisco.

First part of interview

Here is second part of my interview with him:

Anmol Rajpurohit: Q7. What are the major advantages of Apache Phoenix over other alternatives?

performanceJames Taylor: Depends on what you consider "other alternatives", but one big advantage that Phoenix has is superior performance due to the push down techniques we use. Phoenix is also very easy to install and get started using. If you compare Phoenix against developing an application directly on top of the HBase APIs, the advantages come from:
  1. Ease of application implementation due to the use of a higher level language in SQL
  2. Better abstraction between "what" data is accessed versus "how" it's accessed allowing the application to evolve more easily while at the same time allowing performance to improve from version to version
  3. As good or likely better performance than if you hand coded your application

AR: Q8. Today, a lot of organizations are using Phoenix. What is the most common feedback that you get? What do they appreciate the most about Phoenix?

JT: The biggest thing they appreciated from Phoenix is its performance. For more on who is using Phoenix and what they appreciate, see link
AR: Q9. What are the Phoenix in-flight features that you are the most excited about? 

futureJT: Our upcoming 4.4 release introduces a number of new features:  User Defined Functions, UNION ALL support, Spark integration, Query Server to support thin (and eventually non Java) clients, Pherf tool for testing at scale, MR-based index population, and support for HBase 1.0.

After that, we'll introduce transaction support followed by Apache Calcite integration to improve interop with the greater Hadoop ecosystem through plugging into a rich cost-based optimizer framework.

AR: Q10. What is the current status on having Phoenix capable of supporting Joins and Transactions?

tephraJT: Phoenix has had join support for over a year (see link). We're actively working on transaction support by integrating with Tephra ( If all goes according to plan, we'll release this after our 4.4 release (in 4.5 or 5.0).

AR: Q11. Which of the current trends in Distributed Computing and Big Data seem the most interesting to you? Why?

future-trendsJT: There's so much innovation going on right now in the Distributed Computing and Big Data areas, especially in the open source world. I'm excited to see how these technologies will converge and work together more seamlessly.

Here are a few trends, just top off my head:
  • General adoption of Hadoop by all companies
  • Standardization of using SQL to access big data (Phoenix, Drill, Impala, Hive, etc.)
  • Requirements around being able to access data in a low latency manner (Phoenix, Spark, Storm, HBase)
  • Adoption of Apache Calcite as a pluggable query planner framework (Hive, Drill, Kylin, and soon Phoenix)
  • Explosion in the move toward open sourcing everything (at least all platform type stuff)
  • Innovation happening in the open now due to move toward open source
  • Uptake in Phoenix+HBase as a big data store

AR: Q12. What is the best advice you have got in your career?

JT: "Good enough" is usually not good enough (from my dad when I was about 14 years old).

AR: Q13. What key qualities do you look for when interviewing for Data Engineering related positions on your team?

data-engineeringJT: Someone who can get stuff done, is passionate about doing things the "right" way, works independently figuring things out on his/her own, and still knows when to ask questions. Also, someone who is easy to work with and enjoys the collaboration that takes place in the open source community.

AR: Q14. On a personal note, are there any good books that you’re reading lately, and would like to recommend? glass-castle

JT: I really enjoyed both The Glass Castle by Jeannette Walls and Stones from the River by Ursula Hegi.