Interview: George Corugedo, CTO, RedPoint on Big Data Trends and Important Skills

We discuss the key trends in Big Data industry, important skills for Data Science practioners and more.

Here is part 1 of the interview: George Corugedo on YARN and Customer Analytics

George CorugedoA mathematician and seasoned technology executive, George Corugedo has over 20 years of business and technical expertise. As co-founder and CTO of RedPoint Global, George is responsible for leading the development of the RedPoint Convergent Marketing Platform™. A former math professor, George left academia to co-found Accenture’s Customer Insight Practice, which specialized in strategic data utilization, analytics and customer strategy. Previous positions included director of client delivery at ClarityBlue, Inc., a provider of hosted customer intelligence solutions to enterprise commercial entities, and COO/CIO of Riscuity, a receivables management company specializing in the utilization of analytics to drive collections. George holds a BS in geology and a BA in mathematics from the University of Miami, and an MS in applied mathematics from the University of Arizona.

George recently delivered a talk at Big Data Innovation Summit 2014 held in Santa Clara on “Gain Traction for Your Big Data Initiatives: Harness the Power of Hadoop Quickly and Cost-Effectively”.

Here is part two of my interview with him:

Anmol Rajpurohit: Q4. From technology perspective, what key trends will drive the growth of Big Data industry in the next few years?

Big DataGeorge Corugedo: The first growth driver for the Big Data industry is simply the economics of the technologies.
The fact that an organization can capture and analyze massive amounts of data at a fraction of what it used to cost will truly drive the transformation.
We are already seeing organizations, who are running large projects, move little-used data out of very expensive databases and into less expensive Hadoop clusters. The economics is also reflected in the efficient scalability of the Hadoop platform and the raw, cost effective computational power that a cluster can provide. The economics also have a powerful impact on analytics as more and more data can be collected and linked together in order to resolve sharper pictures of customers and behaviors. We will continue to see the economics reflect themselves in new and interesting ways as what used to be “impossible” (i.e., too expensive, too hard, too ambitious) now becomes possible.

The second driver is the flexibility of the platforms.
The ability to capture data quickly without worrying how it should be keyed, structured and fed into a data model gives organizations and business units the flexibility, agility and autonomy they need to respond to changing conditions without being limited by what IT can approve or implement.
This is a really big deal in areas like marketing where data is the life blood of deep insights but IT has bigger fish to fry than worrying about capturing Twitter feeds. Additionally, many manufacturers that build devices are anxious to collect telemetry in an easy and flexible way to monitor their device performance and utilization. The flexible structure of the data enables varied analysis and applications of the data, for example, traffic pattern data that can be analyzed by hedge fund managers to predict retail sales.

The third driver is the increasing frustration business units feel about the organizational latency which often constrains them from doing the right thing. Too often, technology and competing IT priorities create latency within organizations. Instead of making things easier, technical hurdles often delay or stymie good ideas and timely actions. To be fair, IT has their priorities for good reason but the business units also have good reason to want to operate at the “speed of business”. In order to achieve this, the business unit has to take greater ownership of their technical destiny. You see this reflected in the rise of new “C” level roles such as the Chief Digital Officer who, as Gartner defines it, is responsible for creatively leveraging technology directly to drive increasing revenue.

Chief Data Officer Imagine a CIO, CRO and CMO all wrapped into a single position. While this is an extreme example, the desire is there to move in a direction where the business, not IT controls the data and the technology needed to move “At the speed of business”.
Increasingly, Big Data technologies are being seen as an avenue to technical autonomy because of their flexibility and speed.
The ready access to data is addicting to the business users who want to understand and quickly adapt to changing business conditions. To meet the needs of the business units, however, technology providers will have to make the Big Data technologies easier to use. It turns out that YARN is exactly what will enable this.

Such fundamental changes to an industry, however, often reveal unexpected challenges. In the case of Big Data technologies, there are some headwinds that need to be avoided. The skills gap is perhaps the biggest issue. In order to take advantage of many of these Big Data technologies, the users have to be versed in a variety of coding languages and need to be full-fledged engineers. That is not unusual for a nascent technology ecosystem like the Big Data ecosystem, but overcoming this is critical to realizing some of the benefits listed above. That is one of the reasons YARN is so important. YARN will enable later generation applications to work directly within HDFS and bypass the coding requirements that exist today. The other challenge to Big Data technology adoption is the Wild West mentality. While flexibility and autonomy are great, traditional data quality and governance is still essential to developing useful and actionable information.

AR: Q5. What key qualities do you look for when interviewing for Data Science related positions in your team?

SkillsGC: There are many technical skills relevant to being a data scientist or working in the data sciences arena. And as new technologies emerge, the list just grows. However, skill in technologies is really just the price of entry into the data sciences realm. The real stars in data science have a set of skills or attributes that transcend the technical. These include:
  1. The fundamental understanding that the world of data is constantly in flux. This is critical when building data and analytic solutions that are resilient in the midst of constant change.
  2. A keen ability to formulate a question; the right question. This sounds almost trivial but any experienced data scientist or statistician will tell you that the success or failure of statistical analysis is often defined by how well you can ask the right question. Understanding common dynamic patterns in business is critical to being able to do this quickly and effectively.
  3. Good fundamentals are needed. Data scientists have to be grounded in the fundamentals of statistics. Too often, fundamentals are replaced with pretty graphs and “easy to use” tools. Often this leads to erroneous (and costly) decisions.
  4. A commitment to testing. Gathering new data and running experiments is the only way to detect changes in the world around you. If one stops testing then implicitly, one is claiming to have all of the answers. There is no better way to get blindsided than believing you have all the answers.
  5. Creativity is essential, particularly in a world where what is possible changes daily. Data scientists have to be bold and not be constrained by convention. Even a failed experiment delivers insight.

AR: Q6. What was the last book that you read and liked? What do you like to do when you are not working?

The GruffaloGC: Given my schedule lately, I have few brain cells left over for recreational reading but I can say that what has both entertained and inspired me the most for many years is reading The Gruffalo to my daughters. This is something I have done for all of my daughters as they were growing up and most recently with my two year old. It’s become ritual that I look forward to every evening.

For fun, I enjoy spending time with my kids and going rock climbing. The name RedPoint is actually inspired by the rock climbing term, RedPoint. It represents the perfect execution of a route. That is why we named the company RedPoint; we always strive for perfect execution.

In case you missed, here is Part 1 of the interview.