Interview: Thanigai Vellore, on Why Big Data vs RDBMS is the Wrong Question

We discuss success factors with polyglot architectures, Big Data challenges, recommendations for using Big Data technologies, trends, advice, and more.

Twitter Handle: @hey_anmol

thanigai-velloreThanigai Vellore is an enterprise architect, technologist and innovator with over 15 years of progressive experience specializing in building large, highly scalable software systems. At, Thanigai is the lead architect responsible for defining and driving the technology roadmap initiatives for building the next generation technology vision and platform for the company. Thanigai’s interests and specialties include Hadoop/Big data, NoSQL, Distributed Systems, Enterprise Architecture, Scalability, etc. Prior to joining, Thanigai has worked in engineering roles at Sanmina and Flextronics.

First part of interview

Here is second part of my interview with him:

Anmol Rajpurohit: Q6. What are the critical success factors for establishing and leveraging polyglot architectures?  How does one design for flexibility and scalability?

polyglotThanigai Vellore:When working with polyglot architectures, we are constantly evolving and optimizing the stack in not only ensuring that we use the right tool (framework, platform, language, etc.) for the right application but also in making sure that they integrate well with each other. It is very important that every component in the stack is integrated into your software development lifecycle processes – everything from staffing, interoperability, devops, continuous integration, etc.

In addition, choosing the right data format and API interfaces can make the integration easier. Another key factor is to always make a balanced trade-off between using the best tool versus making sure that it aligns with your long-term technology roadmap.

AR: Q7. What are the most underrated challenges of working with Big Data?

TV: I think one of most underrated challenges of working with Big Data is “Governance for Data Quality”. With Big Data technologies, we are increasingly processing different types of data (unstructured, structured, media, etc.) using ELT patterns based on co-located data. In most cases, there is no fixed schema on the data as the schema is applied dynamically during reads and writes. Thus, it becomes very important to apply proper governance around the processing algorithms that exposes these views of the data, considering that both data and the processing algorithms are constantly evolving.
AR: Q8. What are your top recommendations to derive the most value from using Hadoop and other related open source technologies for Big Data?

recommendationsTV: Data is one of the key assets for any business. Big Data technologies allow the organizations to untap, collect and process this data at scale that helps in unearthing powerful insights that can transform the business. I would provide the below recommendations to make “Big Data” a success within an organization:
  • It is more than a tech buzz-word” – Big Data must be internalized within the organization as part of its overall business strategy. Big Data might be a tech project but it requires strong alignment and collaboration with business stakeholders. This is critical to the success of adoption of Big Data through out the company
  • Start small with Big Data” – It is very important to start the Big Data adoption journey with small (but measurable) use cases where you can prove its value to the stakeholders and learn from the implementation
  • Please don’t compare with RDBMS” – It is an “apples to oranges” comparison. Having the right mindset and understanding of what Big Data technologies offer help in setting the right expectations (for both engineers and users).

AR: Q9. Data Architectures are rapidly evolving to the meet the ever-increasing business needs as well as to leverage better technology options. What trends do you expect to dominate in the data architecture arena over the next 2-3 years?

lambda-architectureTV: We are seeing a gradual trend towards convergence of real-time and batch oriented systems. Patterns like the lambda-architecture are being increasingly adopted to build systems that are fault tolerant and serve a wide range of use cases and workloads. In addition, I think we will see a big push towards more memory-centric distributed computing technologies in the coming years. Recent developments in Tachyon and Apache Spark seem really promising.

hard-workAR: Q10. What is the best advice you have got in your career?

TV: Actually, the best career advice that I got was from my mom! She said “Hard work and persistence pays off – eventually” and I think that is very true and was certainly applicable in my case.

AR: Q11. What skills and experience do you look for when interviewing for Data Engineering related positions on your team?

TV: The main skills that I look for when interviewing candidates (for data engineering) is their understanding on fundamentals related to data collection, processing, analysis and transformation. I think learning a particular technology/stack is much easier when your fundamentals are strong. In addition, I also look for analytical and problem-solving skills that can be applied to any domain.

thinking-fast-and-slowAR: Q12. On a personal note, are there any good books that you’re reading lately, and would like to recommend?

TV: I recently read “Thinking, Fast and Slow” by Daniel Kahneman – a must read for anyone interested in understanding human behaviors and how certain biases affect the decisions we make in our daily lives.

anmol-rajpurohitAnmol Rajpurohit is a software development intern at Salesforce. He is a former MDP Fellow and a graduate mentor for IoT-SURF at UCI-Calit2. He has presented his research work at various conferences including IEEE Big Data 2013. He is currently a graduate student (MS, Computer Science) at UC, Irvine.