Interview: Stefan Groschupf, Datameer on Why Domain Expertise is More Important than Algorithms
We discuss large-scale data architectures in 2020, career path, open source involvement, advice, and more.

Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others.Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations.
First part of interview
Second part of interview
Here is third and last part of my interview with him:
Anmol Rajpurohit: Q12. Based on your strong technical background and thought leadership in the field, how do you foresee data architectures in 2020 for large-scale data processing? Would RDBMS’ co-exist with Hadoop? If so, what role would RDBMS play and how would that be integrated with the larger Hadoop-based architecture?

If you really take a part Oracle or Teradata, there's so much optimization technology in there. They're using a different technology to do a full table scan versus a B-tree scan; they do a bunch of caching and have hot and cold data, etc. The same thing will happen with Hadoop, where you will have decision engines, like Smart Execution, that will decide if you need a graph engine for this query or you will run this in-memory or on your hard drive. You will have cross-space optimizers. It will get much more complex and much faster.
AR: Q13. You started your career as a software developer and data architect, and are currently the CEO of a technology company. When you look back on your career path so far, what do you see as the major milestones and what were the key inspirations behind achieving those milestones?
SG: That’s not entirely correct. I actually started my career as a designer and then became a user-interface designer before really coding as a software developer.

There have been six stages of creativity that have helped bring me to where I am now:
- My first stage was learning Photoshop to create still photos when I worked at a music magazine.
- I wanted to build on top of that, so my second stage of creativity was adding a timeline to become a video editor. I cut a film for the Berlinale and cut ads for BMW advertisements.
- My third dimension was 3D animation. I was a very early user of Autodesk Softimage and then later Maya, again, mostly for video advertisement and ads.
- Then, I discovered interactivity. I was one of the power users of Macromedia Director.
- This is where I really found my passion for object-oriented programming, which became my fourth stage.
- Then I realized that I could create functionality – my fifth stage of creativity. That's how I really got into hard-core programming. I always loved data-visualization.
- The sixth, and current, stage of creativity is working with the most difficult of all materials you can design on the planet, and that's humans. Putting them together on functional teams.
AR: Q14. When and how did you get started with coding for Open Source? What were the key learning from your contributions to Nutch (search engine) and Katta (distributed Lucene index)? How has the involvement with Open Source community impacted your career progress and decisions?
SG: I was fascinated early on by the creative process of creating functionality around data, specifically, text data. I worked on network word graphs with early thesaurus datasets and Weka, which is one of the first data-mining, open-source frameworks books.

AR: Q15. What advice would you give to people aspiring a long career in Data Science?

For a Java programmer, it's cool to know assembly, but no one is working in assembly anymore. We have entire school programs dedicated to data science. This is great for the next few years, but it will go away and there will be completely different technology in the future. Become a domain expert.
AR: Q16. What was the last book that you read and liked? What do you like to do when you are not working?

Outside of work, I like to train for Ironman competitions. I also really enjoy projects like building my own Internet of Things devices and researching how to convert data into soundscapes.

Related: