Interview: Michael Lurye, Time Warner Cable on Key Lessons from Shifting to Hadoop

We discuss the key lessons from shifting to Hadoop, data management in today’s world, future of Data Science, advice and more.

michael-luryeMichael Lurye is Senior Director, Enterprise Data Management for Time Warner Cable. He and his team are responsible for shared data warehousing assets and functions that benefit multiple Business Intelligence (BI) teams and their customers. This includes creation of enterprise data assets, BI architecture, quality assurance, and data quality management. In addition, Mike and his team are responsible for evaluation and adoption of Big Data technologies.

Prior to joining TWC, Mike held Product Management and Product Marketing positions with Amdocs, focused on decision automation, mobile content and personalization solutions. Mike’s prior experience includes senior roles at major analytical CRM & marketing services companies.

First part of interview

Here is second and last part of my interview with him:

Anmol Rajpurohit: Q7. What have been the key lessons from your experience of shifting ELT workloads from Enterprise Data Warehouse (EDW) to Hadoop?

moving-to-hadoopMichael Lurye: One lesson I’d like to share is around the skill set and the organizational implications of adopting Hadoop. Companies that invented Hadoop, as well as early adopters, are full of developers accustomed to writing lots of code in low-level programming languages such as Java or Scala. But we are a BI shop and our developer skills are SQL and ETL tools, not Java.

While Hadoop comes with higher-level tools such as Pig and Hive, we do not believe that converting several million lines of SQL code to a similar amount of Pig and Hive code would make sense for us. We decided to supplement built-in Hadoop tools with an ETL tool that presents a familiar GUI paradigm to developers while still leveraging Hadoop / MapReduce framework for job execution.

AR: Q8. Amid the rapid growth in data and technology advancement, how has data quality data-managementmanagement changed in the last few years? Compared to the past, would you say today it is easier or harder to manage data quality?

ML: It’s easier and harder. Easier because data quality tools are much better than they used to be, and harder because user expectations for data quality continue to rise.

AR: Q9. What is the best advice you have got in your career?

ML: Don’t be afraid of change, think of it as an opportunity, not a threat.

AR: Q10. How do you think the expectations from Data Science have evolved over time? Where do you see them headed in the future?

data-science-futureML: Data scientists today are expected to be proficient not only in statistics and other quantitative disciplines but also in data processing and other IT disciplines. I expect this trend to continue and become even stronger. It’s no longer enough to build good models and score them offline in some data mining tool. To achieve real business benefit you need to operationalize analytics, drive it directly into the business process. This means you need to understand the entire data value chain, from source systems to core applications that are used to run the business.

AR: Q11. What key qualities do you look for when interviewing for Data Science related positions on your team?

ML: At TWC, data scientists work in business departments, not IT. I’m not directly involved in hiring them in my current role. I used to hire data scientists in the past in previous roles (we called them statisticians or “quants”). A good data scientist combines three competencies: quantitative methods and algorithms, ability to understand business problems, and IT skills that I mentioned in the answer to previous question.
killing-pattonAR: Q12. What was the last book that you read and liked? What do you like to do when you are not working?

ML: Killing Patton” by Bill O’Reilly and Martin Dugard. I like traveling and try to find time for at least one good trip to some new and interesting location every year.