Interview: Michael Lurye, Time Warner Cable on Big Data and the Insatiable Demand for BI

We discuss EDM at Time Warner Cable, data sources, complementing legacy data warehouses with Big Data solutions, vendor selection and build vs. buy decision.

michael-luryeMichael Lurye is Senior Director, Enterprise Data Management for Time Warner Cable. He and his team are responsible for shared data warehousing assets and functions that benefit multiple Business Intelligence (BI) teams and their customers. This includes creation of enterprise data assets, BI architecture, quality assurance, and data quality management. In addition, Mike and his team are responsible for evaluation and adoption of Big Data technologies.

Prior to joining TWC, Mike held Product Management and Product Marketing positions with Amdocs, focused on decision automation, mobile content and personalization solutions. Mike’s prior experience includes senior roles at major analytical CRM & marketing services companies.

Here is my interview with him:

Anmol Rajpurohit: Q1. What are the typical responsibilities of the Enterprise Data Management (EDM) team at Time Warner Cable?

time-warner-cableMichael Lurye: The Enterprise Data Management team is responsible for shared data warehousing assets, resources and functions that serve multiple constituencies and benefit multiple customer-facing Business Intelligence (BI) teams and their customers. This includes data integration, testing, data quality, reference data management, and architecture.

AR: Q2. What did the legacy data architecture at Time Warner Cable look like, prior to moving to Hadoop? What were its major components?

data-warehouseML: We use multiple data warehouse appliances which serve both user-facing BI workloads and data integration workloads. We adopted Extract – Load – Transform (ELT) data integration methodology that leverages the power of the database to transform data at scale. ELT scripts are written in SQL and are orchestrated using Unix shell scripts and an enterprise job scheduling tool. We also use commercial ETL tools, mainly to bring data from the source systems into the data warehousing environment.

AR: Q3. What are your major data sources? What is the approximate order of magnitude of data that you deal with? How fast has it grown over past few years?

data-sourcesML: Billing systems are the most popular data source but we have many others, including customer care systems, ERP, usage data, and, to name a few. The total amount of storage across all data warehouse appliances is over 300TB.

While we have seen growth in data volumes, our biggest challenge is not the amount of data, it’s the complexity of data transformations and the ability to meet the insatiable demand for BI solutions from our business partners across the enterprise.

AR: Q4. How and when did you feel that this traditional data warehousing system was not good enough to meet company's business needs? hadoop-spark

ML: I wouldn't say that our data warehousing system is not good enough. We could continue meeting TWC business needs by incrementally evolving our existing architecture. But with the emergence of Big Data technologies such as Hadoop and Spark, we found that we could deliver BI solutions more cost effectively by supplementing our existing architecture with Big Data platforms.

AR: Q5. How did you approach the evaluation and adoption of Big Data technologies? What were the major factors in deciding the technologies and the vendors?

tool-selectionML: First we looked at the typical Big Data use cases such as data integration, data lake, query-able archive and advanced analytics. We concluded that data integration and ELT offload represent the best opportunity for Big Data adoption at TWC.

Since Hadoop and Spark run on generic hardware, cost was the main consideration for selecting a hardware vendor. For Hadoop distribution we considered openness, market momentum and support by the third party software ecosystem. We also brought in a specialty tool to increase developer productivity when building data integration jobs to run on Hadoop.

AR: Q6. How did you make the decision on whether to Build or to Buy the Big Data capabilities? build-vs-buy

ML: I’m a big believer in “buy when you can” approach. We have been using an off-the-shelf solution for web analytics for years and didn’t see the need to build a similar capability in house. On the other hand we accumulated several million lines of SQL ELT code that encapsulate TWC unique IP, and migrating that to the cost-effective Big Data platforms is where our “build” efforts are focused.

Second part of the interview